





Votume XIV JULY, 1922 Nos. 1 AND 2 


BIOMETRIKA 


THE STANDARD DEVIATIONS OF FRATERNAL 
AND PARENTAL CORRELATION COEFFICIENTS. 


By KIRSTINE SMITH, D.Sc., Lond. 


CONTENTS. 


PAGE 
Introduction . ; ; : ; 4 . ; : : : : : : ; 1 
I. Fraternal Correlation 2 
(a) The mean value. ; ; ; : : : 3 
(6) The mean value of o?—the presumptive eae urd dovi ig ek ; . ‘ : : 3 
(ec) The standard deviation of o? . ; : ; : < 4 
(d) Mean value and standard deviation of the pr odact oer my Spe ‘ 7 
(e) The product moment, Mn,:, of H and o ; . : ; : ‘ 8 
(f) The standard deviation of the fraternal correlation coctiintiint : : 8 
(g) Numerical evaluation of the formula for the standard deviation of a traliiead 

correlation coefficient . : _ : ; 9 

(4) Application of the formula to previous os ul: iii of sdtin : : ae 
II. Parental Correlation . ‘ ; : ; : . . : ; : —" 3 
(a) Mean value and standard deviation of the product moment II, . ‘ : >, Oa 
(b) The product moment, Myy,,¢2, of My and o°. ; : : : ; : “tS 
(c) The product moment, Hy, of Hy and o”. 16 
(ad) The product moment, Ig2,¢’, of o? and o” : 16 
(e) The standard deviation of the parental correlation coefficient : : ; oe 
(f) The standard deviation of the slopes of the regression curves. P 17 

(g) Numerical evaluation of the formula for the standard deviation of a cabbie 

correlation coefficient . : 18 

(A) Application of the formula to previous ae a itions of corre 4 ation 20 
21 


Summary 


INTRODUCTION. 


No attempts have been made as far as I know to calculate special formulae for 
the standard deviations of fraternal and parental correlation coefficients. The 
usual formula for the standard deviation of a correlation coefficient* which is 
deduced on the supposition that the values of the same variable are mutually 
uncorrelated is generally used also for this case, although it is only correct for a 


* Vide Pearson and Filon: Phil. Trans. Vol. 191 4, p. 229, 1898. 


Biometrika xiv 








2 Fraternal and Parental Correlation Coefficients 

fraternal correlation coefficient calculated from only two siblings of each family and 
for a parental correlation coefficient when only one offspring value from each 
family enters into the calculation. When the material of observation, as is usually 
the case in investigations of inheritance in higher mammals, consists of families of 
varying size, and correlation tables are used in which the same weight is given to 
each observed pair of siblings or pair of parent and offspring, without regard to the 
size of the family, a rational treatment of the probable error is excluded at the 
outset. With material in hand which makes it possible to examine numerous 
siblings, it is most reasonable to confine the investigation to a constant number of 
offspring from each family. In this case the deduction of formulae for the standard 
deviations of the two correlation cpefficients does not present special difficulties, and 
this problem will be solved here. 


We shall suppose that each group of q siblings belongs to the same litter or 
that from other reasons their order of birth is indifferent. Then each pair of 
siblings or each pair of parent and offspring ought to take a like part in the 
calculation, and q siblings give rise to $q (q—1) pair of brothers and q pair of 
parent and offspring which all of them are entered in the calculation. 

The fraternal correlation can thus be calculated either from a correlation table 
which is made symmetrical so that it contains g (q— 1) entries from each fraternity 
or by the formula quoted p. 10 which gives an identical result. 


I. FRATERNAL CORRELATION. 


Although this investigation aims especially at fraternal correlation it concerns 
of course other calculations of correlation in which the material consists of classes 
of equal size inside which the individuals are mutually correlated, all of them 
forming like parts. In the following we shall therefore name a group of siblings 
a class. 

Suppose we have a material consisting of q individuals from each of n classes 
inside which the individuals are correlated while individuals from different classes 
are uncorrelated. We can then consider such a material as one of many possible 
samples of the same nature and size drawn from a population consisting of classes 
of individuals correlated as mentioned. It is therefore possible to face the problem 
of finding the law of errors for the mean value, the standard deviation of the 
character concerned and further for the correlation coefficient inside a class, supposing 
that these are all calculated from a sample like the one now considered. 


Let the sample be y, Ys, Ys ++ Yng With mean value 7 and standard deviation o. 
No special notation will be introduced for individuals of the same class, but summa- 
tion of products is indicated by } when all factors of the product belong to the 
same class, and by S when factors of the same product belong to two or more 
classes. The summations always extend to all n classes. 























KIRSTINE SMITH 


(a) The Mean Value. 


For the sample in hand we have 


The mean value of ¥ for a great number of samples coincides, according to the 
suppositions, with the mean of the population and this we choose for the zero point 
of y. The squared standard deviation of 7 is therefore found simply by squaring 
the expression above, summing for all the samples imagined and taking the mean 
value of the result. We thus find 

o7' = eG | So 2 (yy2) +28 (YiYs)}: 
where a bar above a summation indicates that the mean value has to be taken 
of the sums for all samples, ie. for the population. Let the standard deviation 
of the population be s and the correlation coefficient for individuals of the same 
class 7, we then have atte 
= (y2) = ngs? 
and > (ye) = kng (y — 1) 78°. 

As individuals of different classes are uncorrelated = (yi ys) is equal to 0, and 

accordingly we find 


2 


. e 
a te tT id saccenee aah daha pete (1). 
This contains s and r for the population, which are, as a rule, only known from 
the sample in hand. It will be seen in the following, what is the approximation 
obtained by putting s and r equal to the values found from the sample. 


(b) The Mean Value of o*—the presumptive Standard Deviation. 
For our sample we find 


ee 2\ _. 772 9 
o “oe WF oo a.s ccc hoeswcenauicinssnieaateeeeen (2). 


By taking the mean of o? for a great number of samples we find from this, 
remembering that the mean of 7? equals o;,’, 


a ~1)r\ 
f= e(1 = 1+(q > ") 
ng 
When we take the value found for o* as an approximation to o, we find accord- 
ingly the presumptive value of the standard deviation of the population by the 
formula 
a ee 
ng —{1+(q—1)r}’ 


which for r=0 or g=1 takes the form known for uncorrelated observations. 


2 


por=s =o? 


* Vide Comptes-Rendus des Trav. du Lab, Carlsberg, Vol. xtv. No. 11, 1921, Copenhagen, p. 32. 
1—2 





4 Fraternal and Parentai Correlation Coefficients 
For the s.D. of ¥ we find by introducing s in (1) 


2 i+(q-I))r 
nqg—{l+@—-Dr} 


a,” =< 


(c) The Standard Deviation of o°. 


The s.p. of the o of our sample is found from o,2?= o* —(*)?, where the latter 
term is already known. From (2) we find for the calculation of o* 


wgra* = (ng — 1) E(y.?) — 2B (yrys) — 2 (Yrys) veeeeeeeeereees (4), 
and from this 





nigta* = (nq —1)*(E (yr?) + 4 (2 (Hye) +4 (Sry)? + 
— 4 (ng — 1) = (y2) = (yrs) — 4 (nq — 1) = (ys?) S (trys) + 8E (Yiys) S (ys) «--). 


For the calculation of the mean values contained in this equation, the six pro- 
ducts of product sums must be examined. We find 


(= (y:2) == (y:*) + 22 (y:? yo?) + 2S (yi? ye?) 
CE(yy)P == (ys?ys®) + 2E(yPyoys) + 6X (Wiyoysys) + 2S (YrYo¥s¥s)* }---(6). 
= (7) = (yo) = = (YF yo) +E (ys? Yoys) + 8 (tn°Yo¥a) 


When the multiplication of products containing the factor S(y,yz) is carried out, 
it is clear that we need not consider such sums of products where the product con- 
tains a factor which is uncorrelated with all the other factors of the product, 
because the mean values of such product sums are 0. In the products ¥ (y,*) S (yy) 
and = (y yz) S(yy) all the sums of products are of this kind, the factors being distri- 
buted either in two classes of which one contains 3 and the other 1 factor or in 
three classes with respectively 2, 1 and 1 in each. 


We therefore find 


(S (y2))? = S (yy?) + 28S (yr? yoys) + 48 (YrYoYs¥s) + " 
Z(yw)S (My2) =A | eee Fe (7), 
> (~1y2)S (41 Ye) = a3, 


where the mean values of the a’s for the population are 0. 


Let us denote the product moment corresponding to y;"y."y2 ys! by Binnpa if 
all factors belong to the same class and in the epposite case let us insert ‘d’ or ‘s’ 
as denoting different or same class. 


* In the sums S all factors of a product are supposed to belong to different classes except those which 
are denoted by an ‘s’ inserted between them, as belonging to the same class, 








————————————————————— 




















We find then 


KirRsTINE SMITH 5 


Zu)  =ngB ' 
(yy) =ng(q-1)Bn 
S(yty*) =4nq(q-1) Bo 
= (y?Yeys) = 4nq (q—1) (q—2) Bn 


| 
| 
| 
2 (Yo YsYs) = Heng (¢—1)(¢—-2)(¢—3) Ban * 
S (yy) = $n(n—1)q° Ba a 

S(yyeys) =4n(n—1)¢ q-VBr1 


S (i YoYsYs) a bn (n ay 1) Pe (q- 1) 8, ‘ 1 
s s sds ) 


Till now no suppositions have been made as to the law of distribution of the 


y's, but in the fo 


llowing calculation we shall suppose that the distribution is normal 


and the correlation between individuals of the same class normal. 


* For the general case of normal correlation between n variables the product 
moments have been determined by Sverker Bergstrém *. Taking the standard 


deviations as w 
T12, e3-.., Where 
and 8rd variable 
the product mon 


Substituting 


and further 


We are now 


uits of the variable and denoting the correlation coefficients by 
for instance r,, means the correlation coefficient between the 2nd 
of a product moment Q’nnpg, he finds the following formulae for 
1ents of the 4th order : 


By =3 
B'n = B's _ 3ST 
B= = ar. oa TNR ees kcaniacen ener cee eee (9). 


/ . yy 
Bia = 2petis + 1s 

/ s . . 
Bin = 127 4 + Tis? 04 + Talos 


our special values for the correlation coefficient we find 


B, = 3s! \ 
Bu = 3rs* 
Bo = (27° +1)s* | 
Bon =r (1 + 2r) s 
Bo, mers Lo teeta ereeseeeees eeeeeecees (10). 
Br, =8' 

d 
Boi, =rs' 

ds 
Bi 111 =7*s* 

sds ) 


by means of (8) and (10) in a position to evaluate the mean values 


of the products put down under (6) and (7). 


* Vide S. Bergstrém: Biometrika, Vol. xu. 1918, p. 177. 











6 Fraternal and Parental Correlation Coefficients 


We find 
(= (y2)P = ng {ng+2+2(q—1)7°} 8! 
(S(nye)? = hng(q—1){1+ 2g - 2) r+ [nq (q- 1) + 39+ 3] 7%} 54 
= (yi?) = (yrys) = 40g (q — 1) {ng +4 + 2(q— 2)r} rst (11). 
(S(ny)? =4n(n—1)q?{14+(q—-1) r}?s" 
=(y°)S(Yiys) =O and E(w ye) S (jy) = 0 





The calculation of ¢* may now be continued. We find, by substituting the 
above mean values in (5), 
neg?ot = s* {n?g? — 1 — 2(nqg+ 1) (q—-1) r+ (q—1)[2nq-(q-1)]**}- 
From (3) is found 
n2g? (a?) = s* {n2g? — Ing +1 —2(rq—1)(q—D r+(q—-1 rj, 


and accordingly 


. —_ 4 
oy = 0 — (o* = — ing —1—2(g—-Ur+(q—-1) (ng -q +1)", * 
or arranged according to powers of nq 
gis = A pe--+n DP 12 
ana | +(q-1): “aan +r(q- ort eeetwiedens vee (12). 


This formula for the s.D. of the squared standard deviations is thus exact, 
supposing that the correlation be normal. 
ng 


For great values of n or rather of 1+(q-1)) 


, We may consider the s.D. of o? a 
differential, so that 
a? = 0° + 80° = o* + 2ode. 
From 60° = 208e we find by squaring and taking mean value for a great number 
of samples, 


9 


o~=40%e,', 
and by substituting the value of o%,:, omitting the last term, 


o¢ =-— (1 +(q—1)r%, 


2ngq 


or, as with the accuracy obtainable we have 


= o¢*, 
it follows that : C= =. {1+(q—1) 7". 
° 2ng t ) 


We notice when comparing this formula with (1) that only for r=1 and r=0 
does the rule 
Ce. =4a;° 
hold good, 








KiIRstTINE SMITH - 


The fraternal correlation coefficient p for the present sample is, when all the 
$q (q — 1) pairs of siblings are used for the calculation, defined by 


II 
ae 
where _ II : =( ye 13 
vhere 7 ae D NS eee ees anaes (18). 


To determine the s.D. of p one requires in addition to o%,2, the s.D. of II and the 
product moment for II and o°. 


(d) Mean Value and Standard Deviation of the Product Moment 11. 
Taking mean value of (13) for a great number of samples we find as 
= (yry2) = dng (q— 1) rs* and (y*) =a; 
IT = {r - =U +(q- 1)» Seuhsncniaespionented (14). 


For calculating the mean value of II? (13) may be written 
n'g? (q — 1) I= —(q—1) = (y*) + 2 (ng— 9G +1) 2 (Hy) — 2(G—- 1) S (ry) ---15), 
from which follows 
nig’ (q— 1) TP = (q— 1 (Sy) + 4 (mg 9 + 1 (S (ays) + 
— 4(q—1)(nq— 9+ 1)E(y2)= (ny) +4 (G- 1PS (yd) 
the mean values of the two products being 0 according to (11). Substituting the 
rest of the values from (11) we find 
(q — 1) neg? I? = s* {2ng —(q —1) + 2r [nq (q— 3)—(q—-1¥] 
+ 7 [n?q?(q — 1) — 2ng (q—.2) — (q—1)']}--.--.(16), 
and by squaring (14) is found 
(q — 1) ng? (II) = * (g—1 - 2r [ng (q- 1)-(q-1)] 
+1 [n’g? (q—1)— 2nq (q = 1° +(q— 1}. 
By subtraction of this equation from (16) we arrive at 


2 =f? —(H)= —2 , hake MO ek 
o°y =f? — (II) -aGop ng = E - ng 


fs ae 
+r | g—3y 43-4 ~t 


5 n 
or arranged according to nq ‘ 
, 2s! 
v0 ng q=1) 
which may also be written 


: q-1 az 
pil +r(q—2)P+°7(q- -* [l+r(q- yr .-(17). 


wit 
{h+ 2n(q—2) +9 (2 =39+38)— 21 4r(q— DF, 


2s* 
“Tl 


oe ng q- 








8 Fraternal and Parental Correlation Coefficients 


(e) The Product Moment, Uy, of Tl and o°. 


By multiplication of (4) and (15) and taking mean value for a great number of 
samples we find for the mean value of the product Io? 


n'g* (q — 1) Mo* = —(ng — 1) (q—1) (3 (y2)P — 4 (ng — GF D (3 Hay? 
+ 2 {niq?— ng? + 2(q—1)} 2 (ym) = (mys) + 4(g—-1) (S (Hy) 


the mean values of the two products being zero according to (11). Introducing the 
rest of the mean values from (11), we have 


ng? Ho? = st {— ng — 1 +r [n®q? — nq (q—4) — 2 (q —1)] + 7° [ng (q-3)—-(q - 1}- 
From (3) and (14) is found 
ng? Tl. o?=s' {—nqg+1+r[n’g?—ng?+2 (q —1)]+7r* [- nq (q-—1)+(q-1}}}. 
As Ty = Mo? — 11. 6°, 


it follows from the two foregoing equations that 


Dod 
Tn? = a {— 1 + 2r [ng —(q¢ —1)] + 7* [ng (q — 2) —(q—1)]}, 
2s4 1 ' 
or Une 5 "2+ (92) r-,_0 +(g-WrF ETE eee! (18). 


(f) The Standard Deviation of the Fraternal Correlation Coefficient. 
If the sample is great in proportion to (¢ —1)7 the errors of II and o? can be 
treated as differentials and we have for the correlation coefficient calculated from a 
sample 





+8 1 1 Tl 
P= 324 So? a + moll — (ay be 
on ee —s P| 
Boer te ON 
and P=== > 


1 
1-—-— {l4+(q-1)r} 
ae +(q )r} 
and therefore neglecting the term containing = which according to these supposi- 
7 
tions cannot be evaluated 
p=r. 
TI, .) es, 
- 41] — — salt we find by squaring and forming mean value 
5? 


From 8p = i 
5? 


1 Il\? Tl 
o,=- | ga 4. ( :) e742 — 2 — Tne}. 
(o*)? o a? 
When the values from (3), (12), (14), (17) and (18) are introduced in this 


. es : a 
formula and the terms containing the higher power of — are neglected, we get 
nq 


= 2r? Ar? 
= f , — 2) tint fi es Rs? athe 
ioe tre P+. (q Ij +, \1+(q 1) 7°} ng | + (4 rt, 


“qd 


“ 
op 























KIRSTINE SMITH 9 


from which is found 


9° 
2— f p —9)— 7" — }2 
o, “@-i \l+r(q—2)—7 (q—1)} 


i 
and wan) aepa-nil H(Q—L)r}  cceceeeeeeeeee .-..(19). 


For q=2 this formula coincides with the usual formula for the standard 
deviation of a correlation coefficient calculated from two series of values of two 
variables corresponding in pairs, the values of each series being mutually uncorre- 
lated. 


(g) Numerical Evaluation of the Formula for the s.v. of a Fraternal 
Correlation Coefficient. 


The number, N, of observed pairs of observations being equal to 4nq(q — 1) the 
formula (19) may also be written 


1 
= yl —r){l+(q-—1)r}. 


Comparing materials of observations with different number of siblings g, we 
see that for the calculation of fraternal correlation information of each available 
pair of siblings has a value inversely proportional to {1+(q—1)r}*. The ratio 

l+r . : ' 
%= (; —— ye) serves as a measure for the value which must be attributed to 
1+(q-l1)r 

information of an observed pair among q siblings, supposed that all of the 4nq(q—1) 

pair of siblings are used for the calculation, and supposed that the value of infor- 

mation of a pair of siblings for g=2 is put equal to 1. On the other hand = 
q 

indicates the ratio between the numbers of pairs of siblings which are required for 

obtaining the same accuracy in the correlation coefficient in the case of q and in 

the case of two siblings from each family. Table I gives the numerical values of v 

for different values of r and q. 


























TABLE LI. 
( l+r ) 
Vy = SSS 
1 \l¢+(q-D)r 
| q | r=01 | 02 0-3 O-4 0°5 0°6 0-7 08 | 09 
=e = — ——w | —— —————— —— — |} —_______— — —— 
2 1:000 | 1:000 | 1:000 | 1:000 | 1-000 | 1:000 | 1:000 | 1:000 — 1-000 
3 | -840 735 660 “605 563 529 502 “479 “460 
4 | °716 | *563 “468 ‘405 | °360 | °327 301 ‘280 264 
ae 617 444 “349 290 +250 | 221 200 “184 171 
6 538 “360 270 218 | -184 | °160 143 130 “119 
7 “473 | 298 | 216 ‘170 141 121 107 096 088 
8 419 | -250 | 176 136 111 095 083 074 068 
9 373 | 213 | °146 ‘lll 090 | 076 066 059 054 
10 “335 184 | °123 093 074 | 063 054 048 044 
sates 

















10 Fraternal and Parental Correlation Coefficients 


We notice that for values of r somewhat greater than 0°5, such as are usually 
found for mammals, v; has already decreased to about $ and vy, to about 4. By 
giving the same weight to each pair of siblings when forming fraternal correlation 
tables from a material consisting of fraternities of different size, we therefore fai! 
very largely to pay due regard to the observations. With material under conside- 


ration, as for example anthropometric data, which according to its nature consists of 


small groups of siblings of varying number, and which is not so numerous that we 
can afford to omit observations from the calculation to make q constant for all 
fraternities, the rational proceeding must be to sort the material according to the 
number of siblings and calculate the correlation coefficient of each group separately. 

It is then possible to effect considerable saving of time and labour in the 
investigation of correlation by avoiding the forming of fraternal correlation tables 
and using instead the formula 


1 a, 
r=— (9% -1)s, 
q-1\' o 
where a, is the directly calculated s.p. for mean values of fraternities. The results 
found by the formula are identical with those of the defining formula, so that the 
only objection to this method of calculation is the lack of opportunity to examine 
the shape of the regression curve. 

From the correlation coefficients found for different values of qt, it is finally 
possible with knowledge of their s.D.’s to calculate a mean value of the fraternal 
correlation coefficient and its s.D. 

In investigations of inheritance with animals with numerous offspring, where a 
great number of siblings are available, we have to face the problem of deciding 
what number of siblings it is profitable to employ for the investigation. 

We shall state provisionally the problem as follows: with which value of q do 
we, provided the number of examined offspring individuals (nq) be fixed, obtain the 
most accurately determined fraternal correlation coefficient ? Or in other words for 
which value of g is 

em {l+r(q—1)} a minimum ? 
q — 


* Vide K. Smith, Comptes-Rendus des Trav. du Lab. Carlsb. Vol. xtv. No. 11, 1921, p. 8, where the 
formula is deduced for the special case g=10. 

+ In the memoir quoted it is shewn (p. 29) that the above formula may also be written 
q op 


ang o 


’ 


o*,, being the squared s.p. inside fraternities of q siblings and being calculated as a mean of such 
values obtained from each of the n fraternities. We may here instead of oy, introduce the pre- 
sumptive s.p. inside a fraternity ,o, that is the s.p. we expect to find in fraternities consisting of 
a great number of siblings. The relation is 


2 q y 
ore=— o25 os 
ves q- 1 S49 
o 
so that we find e=1-*L, 
oe 


which shews that the value of 7 arrived at must be expected to be independent of q. 














KirstingE SMITH 11 


The condition of minimum is 
1 
q => 1 + - 2 


Corresponding to the values }, 4, } and # for r the values of q are 5, 4, 3 
and §. 

In examining the question of the most profitable number of siblings, attention 
must also be paid to the determination of the parental correlation and the question 
will therefore be further discussed in the following section. Besides it cannot be left 
out of consideration that, as a rule, it will be easier to examine the same number 
of individuals distributed among a smaller than among a greater number of frater- 
nities. When regard only is had to fraternal correlation, the values of qg obtained 
above must therefore be considered the minimum values. 

For a more detailed illustration of the variation of the s.D. of the fraternal 
correlation coefficient with the number of siblings Table II has been calculated. 
The table gives the values of the s.D. for 1000 observations distributed among from 
500 to 100 fraternities, the sizes of which therefore vary from 2 to 10. 


TABLE IL 


The Standard Deviation of a Fraternal Correlation Coefficient 
calculated from 1000 observed Individuals. 





q | r=} r=} r=} r=} | 
| ee se | 
.- 2}. 2 ‘0398 "0335 “0286 
ro ‘0356 ‘0351 “0316 “0278 
} 4 | 70339 ‘0344 “0323 0289 | 
| & | 0835 | 0948 0335 0304 
| 6 0337 =| 0356 0350 | 0320 
7 "0342 "0365 "0365 | 0336 
a. ‘0349 | «= 0376 “0380 | "0352 
| 9 0356 =| = 0387 "0395 ‘0367 
| 10 ‘0363 | 


“0398 0410 0382 
| | 
The table does not show a rapid increase of the s.D. when the number of siblings 
increases beyond the most profitable number found above. Buta comparison of the 
values for g=5 and for g=10 still shows that the latter are respectively 8°/,, 
14°/,, 22°/, and 25°/, greater than the former, so that when there are 10 siblings 
in each fraternity respectively 18°/,, 31°/,, 50°/, and 58°/, more individuals are 
required to obtain the same accuracy than when there are only 5 siblings from each 
family. 





*) Application of the Formula to previous Calculations of Correlation. 

In an investigation* concerning the characters, nwmber of vertebrae (‘ Vert.’), 
number of rays in the pectoral fins (‘ Pd. and ‘ Ps.’) and number of pigment spots 
(‘Pigm.’) in Zoarces viviparus from the station Nakkehage in Isefjord, Denmark, 


* K. Smith, Comptes-Rendus des Trav. du Lab. Carlsberg, Vol. x1v. No. 11, 1921. 








12 Fraternal and Parental Correlation Coefficients 


the fraternal correlation coefficient was calculated for 6 (for pigment spot only 5) 
samples from different years consisting of fraternities of 10 siblings. In this case 
the probable error of the fraternal correlation coefficient is according to (19) 
0°67449 
P.E. (r) = ——— (1-—r)(1+9r). 
V45n ( a 


Table III gives for each sample the values of », 7 and P.E. (7), as well as 7 for 
all the samples each weighted according to the s.D. 


TABLE III. 


Fraternal Correlation. 











Pd. | Pigm. 
Year when | 
sample | =a aes | 
taken | n | r+P.E. | om r+P.K. n rk?.E. | 
Se Set Sek! See ee at EE: Seed aa | 
1914 | 138 | 04590+-0238 | 132 | 0°3169 + 0231 -- — 
1915 | 168 | 0-4693+°0215 | 174 | 0-4196+°0211 75 | 03175+-0306 
1916 | 123 | 05108+-0248 122 | 0:3985+-0251 87 | 0'3418+-0289 
1917 | 177-«| «0471540209 | 176 | 0363440206 § 127 | 0-4112+-0247 
| 1918 | 153 | 0°4801 + 0225 156 | 0°3329+ 0215 113 0°3074 + 0247 
1919 | 98 | 0°4066+°0281 | 98 0°2893 + 0260 86 0°3722 + ‘0296 
| | | 
Caeterge ae Boren its oe Ga Smee atieee 
- vio ag | — | 04689-0095 | — — os64+-0002 | — | o8s17+ 0122 
ve = | | } 


| 





For the mean values of r probable errors have previously been calculated based 
on the 6 or 5 values found. These probable errors had for 
Vert. Pd. and _ Pigm. respectively 
the values 00094 00137 and 0:0128, 
which for Vert. and Pigm. agree extremely well with the theoretical values now 
found, while for Pd. the error had been estimated somewhat too great. 


II. PARENTAL CORRELATION. 

For investigation of parental correldtion we have a sample consisting as above 
of nq offspring values ¥;, Yo, Ys +--+: Ynq distributed in n classes with q in each, and 
in addition, containing for each class an observed parental value 2 We aim at 
finding the correlation between « and y’s of the same class. 


Let the parental correlation be 7, and the s.D. for a’s s’ in the population which 


we may imagine that the sample represents, and let us choose the mean Vifie of 


the population as zero point for «. 

The parental correlation coefficient is from the sample determined by 
— Hay 
se ,? 


co 


Pp 








KigstTINnE SMITH 13 


where o” is the s.D. of # calculated from the sample, and II,,, is the product moment 
for a and y determined by 


1 ee 
nt, = ag DARE BE cicnitecsisenicnnmnad (20). 


As in the previous section = denotes a sum of products each of which consists 
of factors from the same class. In the sums S each product contains factors from 
at least two classes, and when two factors belong to the same class it is indicated 
by an ‘s’ inserted between them. 

For evaluation of the standard deviation of p, the s.p. of Il,,,o and o’ are 
required, as well as the product moments for each pair of these three functions. 


(a) Mean Value and Standard Deviation of the Product Moment U1,,. 
The equation (20) may also be written 





n—1 Le 
y= ~ = (a, 41) - 7° OMA oS eiesinioe (21). 
By taking the mean value for a great number of samples we therefore find 
n—I ; 
en Me (22). 
oT] 


From (21) we find by squaring and taking mean value 


ny? IP2, = (vn — 1) (= (am) + (S (ay) — 2 (nm — 1) E (ay) 8S (a1) «..(23). 

Together with the determination of the mean values occurring here, we shall 
determine the other mean values of products required for the evaluation of op, They 
are such as arise from multiplication of = (#,y,) and S(«,y,) with each of the two 
groups =(y¥°), = (yy), S (y:ye) and © (#2), S(a,a,) andalso those which contain a 
factor of each of the two latter groups. As in the foregoing section, we need, 
however, uot consider products of a ¥ and an S, because such products may be 
developed into sums of products all containing a factor uncorrelated with all the 
other factors of the product, from which it follows that the mean value for a 
great number of samples is zero for each of these sums of products. It remains 
to determine the following products : 





(= (ay)? => (aPyr) + 2> (P41 Yo) + 2S (x, Yi Yo) ' 
(S (an)! =S (xPy?) + 28 (ary, Yo) + 2S (a, Pree y:) +e 


(ay) =(y2) => (ay!) + = (ay2ys) + 8 (1 ys) 

= (4141) = (Yrye) == (ary? yo) + 8E (aipryoys) +8( @ Yrs Ys) 
S (ay) S (nye) =S (a, yrys’) + 2S (a1 JrYo Ys) + & a 
E(ay)E(w2) =E(w2y) +8 (wm y,) 

S (wys) 8 (war) =S (@2e yn) + 6 : 

S(w)3(y2) = 3B (wey) +8 (wey. 

= (a2) = (mys) == (aPys) + S (wry yo) 

S (ma) S (yiys) = S (ew, yreee Yo) + €4 , | 








14 Fraternal and Parental Correlation Coefficients 


€, €, €, and e, are sums, the means of which are 0. The product moments 

are, as in the previous section, denoted by 8 and the indices concerning “ «’s” are 
ae : , 1a ‘ 2 

placed in front of 8, for instance ia = (a*y*) is denoted by .8,. We thus find for 


the mean values of the sums occurring in (24): 


Z (ary) =ngqsPi ) 
> (a;*y,”) = ng 2B» 

L(aPmye) =$ng(q—-1) Bu 
=(my"*) =nq,Bs 

> (my742) =ng(q—1),B , 

= (a yryoys) = ng (q - 1) (q—-2) Bus 
S (aay) =n(n—1)q 2 1B 





Pl voces sen vbesens (25) 
S(aPy?) =n(n—1)q »Be | 
S@iny) =in(n—Nqq-1) Bi 
Seyew= Inne Bt | 
Sanye) =n(n— Desh» | 
Se ye yo)= 4am — 1) (G1) .B a | 


7. Rees — S » . ° ’ 

From Bergstrém’s formulae (9) we find, when introducing 7,, 7 and 0 for the 
correlation coefficients and remembering that in his formulae s and s’ are taken as 
units for y and «: 





38; = 3rpss ) 
28, =(2r,?+1) ss? 
“Bu = (27,7 + 17) ss? | 
iB; =8ry8's* 
ay = lp (1 + 2r) s's8 
Bu = 3rrps's* 
21k, = Yp 8g 
ds SS Uh vase cose bectrentt peee (26). 
2B, = 33° 

ad 
2B, , = 1s" 

ds 
118i = 1,8 *S" 
as 
1Pr 3 = rss? 

s 


f 3 
1Bi 11 = 1 ps S 
sds ) 


* In this single case the notation fails, as it ought to be indicated that the first x and the last y 
belong to the same class. 

















KIRSTINE SMITH 15 


Applying (25) and (26) we find for the mean values of the products under 
(24) the following values : 

(= (#4) = nq {q(n+1)r,7+14+(q-1)r} s?s? ) 
(S (ay) = nq (n — 1) (gr)? + 1 + (q —1)r} 8s" 
Say) = (y2) =ng {ng+ 242 (q—1)r} rys's* 
= (wry) = (yaya) = 4ngq(q— 1) (2 + (ng + 2q— 2) 1} rys's? 
S (aim) S (nyo) =n (n—1) 92 {1 + (q—1) r} rps’? 
E (ay) (a2) = ng (n+ 2) rps’s 
S (ay) 8 (a2) =n(n— 1) qr,ss 
(a2) =U (y2) = ng (n + 2r,?) 8s" 


=X (a2) = (yiy2) = 4ng (q — 1) {2r,? + nr} ss 


wee (27). 





S (#2) S(ny) = $n (n — 1) @’r,"8"8* ) 


We may now continue the calculation of I]*,,. Introducing the mean values 
in (23), we get 
n?q TP =(n — 1) {ngr,? + 1+ (q —1) 7} 8s”. 


From (22) we find 
n°q (zy)? = (nv — 1) gr,?8s*, 


and when this equation is subtracted from the foregoing 


n—1 ae 
Oy = IL, = I.) ae ng (gry +1l+r (q —1 )} BE cipcicauils (28). 


(b) The Product Moment, This... © of Iz, and o°. 
Multiplication of (4) and (21) gives 


nig® Way. o? = (ng — 1)(n— 1) (y?) = (am) — 2(n—- 1) 2 (Mm) B (Hye) 
- 28 (a, 4.) S (y Yo) +> 


where y, consists of terms S x =, the mean values of which are zero. 
Taking the mean value and applying (27) we therefore find 
nig? il... o? =(n—1) 1p {ng (ng + 1) + rng (q - 1 )} s’s°. 
For I,,.o° we get from (3) and (22) 
WE ny. 0 = (n — 1) ry {ng (ng — 1) — rng (q — 1)} 8's*, 
and accordingly from the two latter equations 


2(n—1) 


nq 


Ty1,), 02 = Hany. 0? — Hay. = ry (L+r(q—D}s’s*...... (29). 








16 Fraternal and Parental Correlation Coefficients 


(c) The Product Moment, Tn,y,07, Of Try and o”. 


For o” we obtain, from the formulae (4), (8) and (12), which concern o°, by 
substituting « for y and putting qg equal to 1: 


fy n—1 9 2 ‘ 
oe = (a) — 3 BAG) 0.000050 Sim ts (30), 
A a~l, 
irra Mee (31), 
and 52> ~~ nas x OF ncsitdenkonae ilaawremedee sion (32). 
n* 


By multiplication of (21) and (30) we get 
nig Up, «0 = (n —1)E (a, y,) = (x2) + 2S (ay) S (a4) + Yo, 
for which the mean value by application of (27) is found to be 
e Il,,. o? =(n?—1) fyi. 
From (22) and (31) follows 
n? yy. 0? =(n — 1) r,s’, 


so that 


Iy,y.07 = Uay.o” — I], .0? = a — 1) np PEE (33). 


n 


(d) The Product Moment, I4:,9:, of o* and a”. 
For the product o°o” we find by multiplying (4) and (30): 
niq?o*a”? = (ng —1) (n—1) & (a,*) = (y") —2 (n— 1) E(w") = (yim) 
+ 4S (x, 22) S (yi ys) + Ys. 
The mean value of y; is zero, and therefore by taking the mean value and 
using (27) we get 
nq oo”? = (n—1) {ng — 14 2qr,7 — (q -— 1) 7} 88°, 
and when from this is subtracted 


nq oo =(n —1) jng—1—(q—1) 7} 8*s°, 
we arrive at 


(e) The Standard Deviation of the Parental Correlation Coefficient. 
For the logarithm of the parental correlation coefficient calculated from our 
sample we have 
log py = log II, — $log a? — flog o”. 
For great values of », which allow us to treat the deviations of o”, o® and II,, 
from their mean values as differentials, it follows from the above equation by 


differentiation that 
Sp, — ST Lry } So*__, da” 


Pp I], oa 


ne 




















KUIRSTINE SMITH 17 


With the accuracy here employed, which excludes the determination of terms 


“ne : a | 
containing the higher power of - , we have 
n 


Po= Pp = F= — = Tp- 


From (35) we find by squaring and taking the mean value for a great number 
of samples 


- : + 1— Ahn = =—— » < 
(II)? (of ' *eyY Bae 6 * &e% 
which by introducing the values from (3), (12), (22), (28), (29), (31), (32), (33) and 


O42 O'o2 Il Haye? I t,yo2 TI, 252) 








; ae : 1 
(34) and neglecting the term containing the higher power of | — ” leads to 


t= bile r+ oni + 22 4 @—y 4% 
2ry” 2r,” , Tp" 
“re ee eae 

. 2 = ‘y. 
o1 af sar inte q—-l1)r -"F[g+3+ G-Dra-nltar4, 
which may be written 

~, _-s? gq! } = er : 

= — a ryil—r, Oe Ge (36). 


The first term is the usuai expression obtained for g=1. From this, for g >1, 
one must subtract a term which for given values of r and r, increases with q. 


(f) The Standard Deviation of the Slopes of the Regression Curves. 


We shail finally add the formulae of the s.D. of the slope of the regression curves 
for the calculation of which we have all the material ready. The regression 
coefficients are determined by 


Uy 


i 
*’ and d= — 
o 


Ay = D 
? oo? 


By ditferentiation, squaring and taking mean value, we find 
O'nzy Og: Un y a? 
“lai, (ey o Try 


and a corresponding equation for Pays 


2 
Cag = 


From these we find by introducing the s.D, and product moments 


2 s —1)—ar,2}* 
Cay = ngs? {ltr (q 1) gry} 


fy 


2 § f © o 2 »\j2 
Og= gg bP (dV) ay + 2G 1) 4 = 19. 


and 


* Vide K. Smith, l.c., pp. 6, 7, where the same formula is deduced in a different form, cortaining 
: , : a ; Is 
q instead of r, The two expressions are easily seen to be identical when the term ie neglected. 


Biometrika x1v 








18 Fraternal and Parental Correlation Coefficients 


(g) Numerical Evaluation of the Formula for the s.p. of a Parental 
Correlation Coefficient. 


We shall first examine how valuable a material consisting of n groups of q 
siblings with corresponding parental values is compared with nq pairs of values from 
different families. Denoting the s.D.’s of p, calculated from the two materials by 


gp, and o,,, we find by applying (36) 


eee a1" pp = (1 he r,?) 
he ars ( 
“mgd —ry—-@-HA =n) {1-3 "| 
This ratio indicates the value of an observed pair, when the parental value also 
occurs combined with (q —1) other offspring values, in proportion to the value of an 
observed pair when the parental value only occurs once in the calculation. 





The numerical values of (37) are, for values of 7, and r, fairly well representative 
of the values met with in investigations of inheritance given in Table IV. 


TABLE IV. 


oo! an we <8 
Ug =O 7p ig 














P gPp* 
Z | : | 
a= Ss Ty °4 Ty= 5 
a r= “4 rab ~s | 
——— — —-— : | —_—— iinet 
1 1-000 1-000 1-000 | 
735 698 666 
. on ‘581 ‘536 ‘499 | 
| 4 “481 “435 "399 | 
ae. “410 306 4206|)~— ‘332 
| @ | 7 316 0«| | 285i 
ee ‘316 278 249 | 
8 | +284 248 221 
a ‘258 "224 199 | 
10 | ‘236 204 ‘181 | 


It appears that entering into the same parental correlation table families with 
numbers of offspring varying from, for example, 1 to 5 the same weight is given 
to pairs of observations which according to Table IV ought to vary in weight from 
1 to 4. 

It is therefore a more rational proceeding to sort the families according to the 
number of offspring and deal with each group separately. The work may then be 
shortened by calculating the correlation ccefficient between the parental value 
and the mean for the offspring from which the parental correlation for individuals is 


oq 


obtained by multiplying with —’, o, being as above (see I(g)) the s.D. for means 


o 
of fraternities of q individuals. It is then possible to calculate the correlation 
coefficient with s.p. for each group of families and finally calculate a mean value 
for the correlation coefficient. 














KIRstTINE SMITH 19 


In investigations of inheritance with animals with numerous offspring it is as 
a rule easier to provide information of a given number of individuals among 
a small number of families than to examine the same number of individuals if 
they belong to a larger number of families. The labour required is therefore not 
proportional to the number of individuals and it must be estimated for the 
individual materials whether the encumbrance of dealing with a relatively large 
number of families is duly compensated for by the reduction of the number of 
individuals hereby permissible. 


It does not seem at the outset probable, but it may be possible, that, even in 
cases in which parent and offspring are equally easily available for investigation, 
a shortening of labour, that is, a diminution of the total number of observed 
individuals, may be obtainable by examining several offspring individuals of each 
family. We will therefore examine for which value of g, o%, is a minimum 
when n(q +1) is put equal to a constant k. We find the condition 

nee ] ,o-7) 
(1—r,??- (1 + =A (l-r) {1 =e = = 0, 
from which follows 
,3-r 
(1 -r) {! ema Salar 


» 


he 





bss ,o-T) 
(1 —1,?)-(1—r) ! “tet 


To obtain a survey we introduce a few scts of values for r, and r for which we 
give the result in Table V. 


TABLE V. 


} 
| —= 
| | 
020 | o2 | 18 
0°30 | 0:40 | 1:3 


It will be seen, that for sufficiently small values of 7 and 7, it is profitable to 
examine several siblings of each family in those cases where the examination of 
an offspring individual requires the same labour as that of a parent. 


As a guide for the choice of the number of offspring in the more frequently 
occurring case when it is easier to provide data of offspring than of parent, we 
give in Table VI for some values of r, and r the number of observations which, 
for varying values of qg, yield the same accuracy in the parental correlation 
coefficient as 1000 parents with 1000 offspring. 


It appears from the table that while the number of offspring increases evenly 
. . . . } « 
with increasing g the number of parents decreases more and more slowly, so that 
the compensation obtained in this way for the increased total number of offspring 
2—2 








20 Fraternal and Parental Correlation Coefficients 


tends to be very small for increasing g. Already by increasing q froma 5 to 6 we 
find, for r,="3 and r=-4, that to outweigh the augmentation of 360 in the 
number of offspring, we only get a diminution of 21 in the number of parents. 


TABLE VI. 


Number of Parental and Offspring Individuals which for varying q 
yield the same Accuracy to pp. 























Ty='3 | Tp =°4 Tpy=d 
r"4 | f=6 r=°6 
op = *0288 %,= 0266 oo, 0237 
| = me 
Number of Number of | Number of | Number of Number of Number of 
q Parents Offspring Parents Offspring Parents Offspring 
SS ey eee x | a oe ae ee ae 
1 1000 1000 1000 1000 1000 1000 } 
2 680 1360 717 1433 751 1502 
3 573 1720 622 1866 668 2004 
4 520 2081 575 2299 | 627 2507 
5 488 2441 546 2732 =| ~—«602 3009 
6 467 2801 528 3166 } 585 3511 
7 452 3161 514 3599 573 4013 
8 440 3522 504 4032 | 565 4516 
9 431 3882 496 4465 558 5018 
10 | 424 4242 490 4898 52 5520 
| 














For fraternal correlation we have found (see Table II) that the most profitable 
number of offspring was 3 





4 for the values of 7 now considered, and that a 
somewhat greater number was not substantially opposed to economy of work. 
Whether the number ought to be increased beyond 3—4 or confined to even 
fewer offspring individuals from each family depends in each investigation upon 
the relative difficulty of observing parents and offspring. 





(h) Application of the Formula to previous Calculations of Correlation. 


For the investigation of Zvarces viviparus mentioned in the previous section, 
in which 10 offspring individuals were examined for each mother, we have 
according to (386) the following formula for the probable error of the maternal 
correlation coefficient : 


ae Pee ryt 
P.E. (7",) = Vn [a-nr-jp0-9 [1-7 2 |} . 


In Table VII are found the values of r, for the three characters examined as 
well as their probable errors calculated from this formula. Giving each of these 
values of r, its due weight we have calculated a mean value and its probable 
error. 





























KIRSTINE SMITH 21 
TABLE VII. 
Maternal Correlation. 
| Vert. Pd. Pigm. 
| Year when | 
sample Sei Pa 
| taken MpP.E | Ty £P.E. TpP.E. 
PPLE | ca: See | 
1914 0°3513 + 0343 02409 + 0332 one 
1915 0°4375 + 0281 0°3215 + 0303 0°3762+°0381 | 
1916 0°4139 + 0355 0:2116 + 0387 0°3622 + ‘0373 
1917 0°3775 + 0298 0:2824 + 0293 0°3722 + 0332 
1918 0°4382+-0298 | 0:2928+-0298 0°3710 + 0308 
1919 0°3674+°0378 | 0°1851+°0387 0°3380 + 0398 
| From total) 6.4991 +0131 | 0:2654 + 0133 0°3654 +0158 
samples = citi Ooi, a os 








It appears that these probable errors agree extremely well with those originally 
calculated* on the basis of the 5 or 6 values of the correlation coefficient obtained 
from 5 or 6 samples. 


Summary. 


In the first section we dealt with fraternal correlation and a formula was deduced 
for the standard deviation of the fraternal correlation coefficient for the case when 
the material of observation consists of equal numbers of offspring from each family 
and when each available pair of siblings is introduced into the calculation. The 
formila is calculated on the supposition of normal distribution and normal fraternal 
correlation. 

It is shewn by means of the formula that forming fraternal correlation tables 
for fraternities of different numbers and giving each pair of observations the same 
weight we disturb very highly the distribution of weight which the observations 
must claim according to their nature. We find further from the formula that 
when the number of observed offspring from each family may be freely chosen 
the best determination of fraternal correlation from a given number of observations 


is obtained by taking (1 + ) offspring individuals from each family (7 = frater. 
- § 
corr. coeff.). 


In the second section we deduce, also supposing normal distribution and 
normal correlation, the s.D. of the parental correlation coefficient calculated from a 
material comprising equal numbers of offspring from each family. The formula 
shews that forming parental correlation tables of a material consisting of families 
of different sizes we also in an unfortunate manner disturb the due distribution 


of weight among the pairs of observation. It is shewn that if observations of 


* Vide l.c., p. 24, Table 6. 








22 Fraternal and Parental Correlation Coefficients 


parents are as easily produced as those of offspring it is, for determination of 
parental correlation, only for small values of the corr. coeffs., for instance r,< } 
and r<}, profitable to include more than one offspring individual from each 
family in the calculation. For the case more frequently occurring, when the 
observation of parents represents more labour or greater cost than that of offspring, 
we have for certain values of r, and r and varying sizes of fraternities calculated 
such numbers of parents and of offspring which yield the same accuracy to the 
parental correlation as 1000 parents with corresponding 1000 offspring. Table VI 
shews that when the number of siblings exceeds 4 
by increasing it. 





5, there is not much gained 


Considering both fraternal and parental correlation we may therefore generally 
conclude that an essential increase in the number of offspring beyond 1 + —, i.e. in 
) = 


practice 8—4, is only then to be recommended, when it causes a relatively in- 
significant increase in labour. 


This research has been occasioned by the investigations of inheritance carried 


out by the Carlsberg Laboratorium Kobenhavn and I am much indebted to 
Dr. Johs. Schmidt for the interest he has taken in my work. 











By EGON 8S. PEARSON, Trinity College, Cambridge. 
CONTENTS. 

I. Introduction . 2 : ; - 

| II. Generalized Theory of Personal Equation 


III. Description of the Experiments 
(a) Experiments A and B 
(6) Experiments Cand D. 
(ec) Experiment # . : : : : - 
IV. Terminology and Table defining Constants 
V. On Methods of Reduction 
(a) Variate Difference Correlation 
(b) Application of the Results of V. (a) 
VI. Experiment A. Reduction 
(a) The individual Series 
(6) The Combination of Series . : : : . : 
(c) On the possible Result of shifting the Head during the 
course of a Series 
| (d) Summary of Results 





VII. Experiment B. Reduction 


(6) The Combination of Series . 
(ec) Comparison with Experiment 4 . 
VIII. Experiment C. Reduction 


| (a) The indivédual Series . 
s 


| : (a) The individual Series . 

| (6) The Combination of Series. 
IX. Experiment D. Reduction 

| (a) The individual Series . 


(6b) The Combination of Series . 
(ec) Comparison with Experiment C . 
X. Experiment #. Reduction 
XI. Analysis of the Correlation between successive Judgments 
(a) The Theory of correlated Estimates and accidental Errors 
(6) Application of Theory to the Results of the Experiments . 
XII. Prediction 


Summary and Conclusions 








ON THE VARIATIONS IN PERSONAL EQUATION AND 
THE CORRELATION OF SUCCESSIVE JUDGMENTS. 


PAGE 


45 





24 On the Variations in Personal Equation 


I. INTRODUCTION. 


Starting from Bessel’s discovery, in the early part of the last century, of the 
existence of a definite relative personal equation for two observers recording 
transits by the eye and ear method, there has been a continuous discussion among 
astronomers on the errors which such personal equations may introduce, and on 
the methods of eliminating them or correcting for them*. In such discussions 
it has been the usual practice to take the yearly mean personal equation, whether 
relative or absolute, of different observers and to use this mean personal equation 
as the basis of any correction to be applied to observations made in that year. 
From. a comparison of the yearly means it is admitted that there may be gradual 
secular changes in personal equation, but it is found that for experienced observers 
there is usually very little variation. In text-books on Practical Astronomy brief 
mention of the subject is usually made, and the conclusion drawn is that for an 
observer in normal health, the personal equation in any one type of observation 
will remain sensibly constant for “short periods” of time; an exact definition of 
the words “short period” is not and clearly cannot be attempted}. It is further 
assumed that variations from the personal equation are due to accidental errors 
and may be taken as randomly distributed in accordance with the Gaussian Law. 
With the recent introduction of photography and mechanical methods of record, 
the interest of the astronomer in the subject has to some extent diminished, but 
there are many fields of scientific observation where the human element cannot be 
eliminated, and in the modern researches of the psychologist we find a study is 
made of problems of this type for their own interest and for the light which they 
may throw on the working of the human machine. 

One very important aspect of the problem of personal equation, and of par- 
ticular import to the astronomer, was discussed in detail in a paper entitled “On 
the Mathematical Theory of Errors of Judgment, with Special Reference to the 
Personal Equation,” published in the Phil. Trans. (Vol. 198A, p. 235). In this 
case various series of experiments were carried out simultaneously by three 
observers under identical conditions and it was found that there was a marked 
correlation between the variations in absolute personal equation of the different 
observers. This in itself was sufficient to show that the judgments of any one 
observer were not randomly distributed about his mean personal equation. The 
purpose of the present paper is to discuss the variations in judgment of one ob- 
server, and to inquire how far the evidence of four or five experiments suggests 
that the theory of personal equation and of errors of judgment, as usually accepted, 
requires modification. } 

The subject is a large one, and much beyond the scope of a single paper; but 
by making careful inquiries of this type with the help of statistical methods, it 

* For example, Monthly Notices, Vol. xu. 1880, pp. 75, 165, 302 (Discussion of Greenwich Obser- 
vations of the Moon); Monthly Notices, Vol. xiv. 1884, pp. 1 and 39 (Greenwich Observations of the 
Sun); Monthly Notices, Vol, ivi. 1897, p. 504 (General Discussion of relative personal Equations). 

+ For example, in Campbell’s Elements of Practical Astronomy, 1899, p. 157; Young’s General 
Astronomy, Revised Edn. § 114, and Chauvenet’s Spherical and Practical Astronomy, 4th Edn. 1. p, 189. 























Eaon S. PEARSON 25 


may be possible to construct a more generalised theory of errors of judgment 
than that which has hitherto been adopted, and although the practical corrections 
which such a theory will impose may not be large, yet a more detailed knowledge 
of the nature of the variations and perhaps some insight into the psychological 
and physiological factors which underlie them, will give the observer a clearer 
idea of the precautions to be taken to avoid error and a greater justification for 
confidence in his results. 


II. GENERALISED THEORY OF PERSONAL EQUATION. 


Before proceeding to the reduction of the Experiments which have been carried 
out, I will consider whether it is not possible to make a very general, and yet 
simple, analysis of personal equation. Let us suppose that we have a large number, 
N, of observations, which have been made in separate groups, or at what may be 
termed separate sessions. For the astronomer, a session will be a night’s work ; 
for the physicist or psychologist, one continuous set of readings or observations. 
Any particular observation y may be designated (1) by 7, a function of the time 
when it was recorded, measured from some fixed epoch, or (2) by the number of 
the session in which it was made, and ¢, the time of record measured from the 
commencement of that session. E.g. an observation made in the pth session may 
be written either as y, or py. We will suppose that the secular change can be 
represented by the function $(7), but in addition to this change there may 
be another of a different type which may be termed the sessional change, and 
will be represented by the function f,(¢). The fundamental difference between 
a secular and sessional change is this: if there is a break of some hours or perhaps 
days between two series of observations, the sessional change of the first series 
will have no influence on the judgments of the second series, while the secular 
change will continue from series to series. The sessional change is thus peculiar 
to its own session or series of observations, although it is very possible that the 
same type of change may be repeated in session after session; it may be’a change 
resulting simply from fatigue or perhaps from more complex causes. Figure 3 
(p. 46) provides a good illustration of secular and sessional changes ; the centres of 
the small circles represent the mean values of twenty different series of observations, 
and it will be seen that the general tendency is for a drop in mean judgment 
from left to right of the diagram; this is the secular change. The sessional 
changes are represented by the continuous lines drawn through the centres of 
the circles, and the slope of these lines is on the whole seen to be very constant 
throughout the twenty series. In this case the secular and sessional changes are 
acting in the same direction, but they may well act in opposite directions. 

We have thus seen that an observation y may be expressed in the form 


y=O(T)+ HO+N: «...- peek ines veer | 
where Y; is the residual after the removal of secular and sessional changes. The 
duration of the session is likely to be so short compared with the period over 
which the secular change is measured, that t may be taken as practically constant 








26 On the Variations in Personal Equation 


for any one session, and $(7,) may be described as the secular term in the ob- 
servations of the pth session. It remains therefore to consider the function /f, (¢). 
Supposing that there were n observations made in a session, it would of 
course be possible to fit an (n —1)th order parabola on which all the observations 
would lie, so that the values of Y; would all be zero, but such a curve would be 
entirely useless. If the observations are made at finite intervals so that we can 
imagine that one may be interpolated between two others, owing to the mass of 
random errors to which each judgment is subject, we should not for a moment 
expect that the interpolated error would lie on, or even close to the (n —1)th 
order parabola. A curve of far lower order would probably give a much better fit. 
If the sessional change is a sign of some physiological change of state which is 
affecting the ,observer’s judgment, it is natural to suppose that it can be repre- 
sented fairly closely by some simple curve—a low order parabola if not a straight 
line, or perhaps, if periodic, a sine curve. Suppose that in a practical case, a 
first or second order parabola has been fitted to the observations of a session; then 
it will be easy to test whether the residuals Y; follow a Gaussian distribution ; 
a simple practically sufficient, if not theoretically sufficient test would be to find 
whether 





Ritawkh 3 1FA66.ckn (ii) | 
t=1 t=1 : ae ly 
aS Mtb - approximately. 
Ss v4 
B.= 1d 1 ane (iii) 


(ae ae 

But there is a further possibility; it may be found that although the relations 

(ii) and (iii) hold approximately, the Y;’s are not randomly distributed in time, 
and that there is in fact a correlation between the successive values of Y;, so that 


n 
> (¥:¥i42) 
t=1 


6 tte n n #0 
J (ry EPs) 
t=] t=1 


for perhaps several positive integral values of / from 1 upwards. 


ry 





To emphasise the importance of the different terms in the relation 
gle Blt) FFA io onvcnticiicesesies eves ..(i) bis, 
let us take the case of an astronomer who makes a number of observations, often 
at many days’ interval. He will take a mean 
y/ = mean ¢(7,) + mean f, (t), 
but he must not suppose that the quantities 
p¥t — ¥ = b(t) — mean (t,) +f, (t) — mean f, (t) + Vr 

follow a Gaussian distribution. It will be only a part of the expression that does 
so, the Y;’s, and it is possible that even these may not be true. 

Further it is clear that successive values of ,y,—% will not be independent ; 
correlation will arise from the inclusion of both the secular and sessional terms, 



































Egon S. Pearson 27 


and perhaps too from a relationship between the successive Y;’s. There may be no 
large scale sessional change, and it may be possible to correct for a secular change 
in personal equation, but even then the mean of a small number “m ” of successive 


: ; 1 ‘ ‘ 
observations, subject to its probable error °6745 of mom will not be a satisfactory 
W 


approximation to the true value of the quantity observed, if these “m” observations 
are correlated. Suppose for example that the points in Figure 14 (p. 76) represent a 
series of successive observations which have been corrected for any secular change 
in personal equation; the linear sessional change is small and has been represented 
by the continuous straight line, while the dotted straight line represents the mean 
value of the 63 observations. Yet many sets of 10 consecutive observations could 
be taken, the difference between the mean of which and that of the whole 63 
would be far greater than would be anticipated from the value of the probable error 
calculated from the expression above. This is because the observations are not 
randomly distributed in time. 

In addition to secular and sessional changes in the value of an estimation, there 
may be similar changes in the standard deviation; the judgments may become 
more erratic or less so, A sessional change giving an increase in standard deviation 
would suggest the effect of fatigue; and secular change decreasing the standard 
deviation might be the indication of increased accuracy with experience. An 

example of secular change in personal equation and standard deviation is illustrated 

in the diagram on p. 84; the details of this will be discussed more fully in the 
reduction of Experiment D, but it is here sufficient to say that the central curve 
represents the smoothed personal equation, while the distance between any point 
on this curve and either of the outer curves gives the smoothed standard deviation 
at that point or period in the series of observations. It will be seen that the 
standard deviation increases in the later observations. 


It would be out of place at this point to enter further into the details of 
variation in personal equation and correlation of judgments, but I think that 
enough has been said to indicate the general lines of enquiry. In choosing the 
experiments which will be described in the following sections, the aim has been 
to select those in which there was likely to be considerable variation in judgment, 
and where consequently the secular and sessional changes, if present, would be 
clearly recognizable and the correlation of successive judgments easy to measure. 
It was also important that the errors in measurement should be small compared 
with the variations in judgment. = 


It may of course be urged that the experiments should have been carried out 
by an observer who was unaware of the lines of enquiry and therefore not liable to 
bias of any form, but this was not practicable, and in fact none of the reductions 
had been completed nor the general theory developed before all the experiments 
had been carried out, and I do not think that the observations could have been 
affected by any conscious or unconscious prejudice. 











On the Variations in Personal Equation 


Ill. THe EXPrERIMeENrs. 


The present paper is based on the reduction of the following Experiments : 
A. Estimation of the value of a Third, or Trisection Experiment. 
B. Estimation of the value of a Half, or Bisection Experiment. 
C. Estimation of Time, by counting of Ten Seconds. 
D. Estimation of Ten Seconds without intermediate counting. 


E. Some repeated measurements of fine structure in a Stellar Spectrum, 
with a Zeiss Comparator. 

The first four Experiments were carried out by the writer in accordance with a 
uniform scheme ; each Experiment was divided into 20 series of 63 observations, 
making 1260 observations in all. Only one series (or 63 observations) was done 
at a sitting to avoid as far as possible the effect of fatigue; in the case of 
Experiments A and B the sequence of the series was much broken, spreading 
over some weeks, but C and D were carried out within four consecutive days. 
The dates of the series are given with the detailed discussion of the observations 
below. 


(a) Eaperiments A and B. 


Figure 1 is a copy of one of the printed forms used for these experiments ; the 
longer line was used for A ; distance between inner edges of bounding marks 7°53 
inches ; the shorter line was used for B; distance between inner edges of bounding 
marks 5°94 inches. 


The lines were on the same form simply for convenience in printing, etc. and 
that not used was concealed while the observation on the other was being made ; 
a fresh line was used for each of the 1260 observations. In carrying out a series 
a pile of 63 forms was placed on a table slightly tilted up towards the observer, 
and straight in front of him, with a good light coming from the left-hand side, the 
pencil being in his right hand. He then made a short pencil stroke across the line 
at the point which he estimated was one-third way along the line from the left- 
hand end (Experiment A), or at the point which he considered to bisect the line 
(Experiment B). He then turned the form over, face downwards at his side, and 
proceeded to deal with the next form in the same manner, continuing until the 63 
were finished *, The pencil stroke was made after a rapid eye estimate, the aim 
being to record the first impression of third or half formed upon seeing the fresh 
line, and to avoid hesitation; the average time taken in going through a series of 
63 observations was 5 minutes 40 seconds for Trisection, 5 minutes 22 seconds for 
Bisection, or 5:4 seconds and 5:1 seconds respectively between judgments. 

To avoid bias, it would have been desirable to complete all the observations of 
an experiment before commencing the measurement of any of the series, but 


* Actually in Experiments A and B 70 forms were marked in each series; the first 7 were to enable 
the observer to ‘‘get his eye in,” and the measures of them were not used at all in the reduction. 




















Line used for Experiment A. 


7:53 inches. 


Distance between inner edges of bounding marks. 





Line used for Experiment B. 


5:94 inches. 


Distance between inner edges of bounding marks. 





Eaon S. Pearson 29 


Fig. 1. 


from considerations of time and as all the forms 
were not printed at the commencement this was 
not done. In some cases therefore a series was 
measured directly after it had been marked, and 
if the observer happened to remember that its 
estimates were considerably too large or too 
small, his judgment would almost certainly be 
influenced when marking the next succeeding 
series; the correlation of judgments within this 
second series would hardly be altered, but any 
natural secular change which had been occur- 
ring from series to series might be broken *. 

The measures of the observations were made 
with a ruler divided to fiftieth’s of an inch, so 
that readings could be taken to one hundredth 
of an inch with fair accuracy. 


(b) Eaperiments C and D. 

These two experiments were carried outwith 
the help of a chronograph. The instrument 
was run by clockwork, and had a paper tape on 
which records could be made independently by 
two pens worked by small electromagnets. One 
pen was put in circuit with a second’s pendulum, 
a platinum pointer at the end of which made 
contact at each swing through the vertical 
position by cutting through a bead of mercury, 
the other pen was connected with a tapping 
key. The rate of the driving clock was not 
quite uniform, and the pendulum second-marks 
on the tape were therefore necessary in reckon- 
ing the intervals between the marks made by 
the other pen, corresponding to taps of the key. 
As the estimate in both experiments was one 
of 10 seconds, it was found that except for a 
few cases in Experiment D+, the true value of 
the time interval between the taps could be 
represented with sufficient accuracy by the factor 
e/p, where, 

* See p. 49, remark in Table I, regarding Series IX 
and X. 

+ In Experiment D, some of the estimates had values 
nearer 20 seconds than 10 seconds, and here half the dis- 


tance on the tape between the nearest corresponding 20 
seconds was taken for p. 








30 On the Variations in Personal Equation 


e was the distance measured on the tape between consecutive marks of the 
key. 

p the length on the tape of the nearest corresponding 10 seconds recorded by the 
pendulum pen. 


: e 
Had the pendulum been beating exactly one second, 10 KG seconds would 


have been the true length of the estimate; actually the period as found by com- 
parison for a long run with a watch was, 


before Experiments C and D ( 6th December) 1:020 seconds) 
after . e (16th Ke ) 1:019 Ae i 


so that the length of estimate with sufficient accuracy is 10°2 x : seconds. It is 


the factor = that will be used throughout the reductions. 

















n N iN K N N nN 2 Reccuscsamaill CREE A, CE 
ll ee ad ee Se Se ae ee 
b c a b € “a 


Fig. 2. Shows a small piece of tape, and tle points from which the measurements were made. 


If the amplitude of the pendulum was rather small, it was sometimes notice- 
able that the intervals between the second marks were alternately longer and 
shorter; this was due either to slight deformation in the shape of the mercury 
bead or (what is really the same thing) from the centre of the bead not having 
been placed exactly under the equilibrium position of the platinum pointer. But 
in taking for measurement the even number of 10 seconds, such errors would be 
inappreciable. 

In both experiments the beginning and end of the estimate were recorded by 
sharp taps on the key (at a and b respectively in Figure 2); a long drawn tap 
(c in figure) then followed to make a break before the next estimate was recorded. 
The interval between the b tap of one observation and the a tap of the following 
varied from 1$ to 24 seconds. This method of record soon became quite auto- 
matic, and very few mistaps occurred. 

The measurements on the tape were made from the sharp beginnings of the 
marks, which correspond to the making of the electric contact at the beginning of 
the tap on the key. 

In Experiment ( the counting was “ sotto voce,” the first tap being made on 
the count “nought,” the last on “ten”; in order that the counts might be quite 
uniform the word “sen” was used instead of the two-syllabled “seven.” The 
counting was usually done in step to a slight beat of the thumb on the key (nof 
hard enough, of course, to make contact), and it was fairly easy to keep the 
attention concentrated during the counts. In Experiment D there was no counting 
and it was far harder to keep one’s mind fixed; in fact the mental effort required 
was quite noticeable, and I found that a greater interval of rest was required 














Egon S. PEARSON 31 


between each series than for C. It is mainly by reference to the passing of 
external events, to changes the duration of which we can infer from previous 
experience, that we estimate any but the shortest intervals of time. In the 
counting experiment, the second-intervals between each of the 10 counts which 
made up the observation were comparatively short, and the beating of the thumb 
or fingers became almost mechanical; the interval of course varied but was not 
subject to violent fiuctuations. But while most people are able to estimate a 
second interval with fair accuracy, it would need very much practice to estimate a 
10 second interval, and in my case I found it quite impossible to concentrate 
attention for 10 seconds, solely on the passing of time. I soon found myself 
imagining that I saw the seconds’ hand of a watch, passing usually from the 
position where 60 is marked on the dial to the 10; but it was not another case of 
counting, for I did not note the passing of each individual second mark, only 
having a vague idea of the position of the 5 second division line. If I tried to 
think of nothing, my thoughts probably wandered on to other subjects, until 
I came up with a start, and realising that I had very little idea of how long 
before I had pressed the key to start the observation, pressed it to finish, with 
the greatest uncertainty. To keep attention fixed, it appeared that I must’ try to 
record the stages of the passage of 10 seconds, and this I was doing vaguely on 
the imaginary clock face, but I must say that the seconds’ hand was very re- 
fractory, at times appearing to stop or even move backwards, and was often so 
slow that I had to close the observation before it reached the 10 second mark. 

I have given the above description at some length in order to shew that there 
was an essential difference between Experiments C and D, which is borne out by 
the figures of the reduction given later in this paper. The observer with the key 
sat in a separate room where the beats of the chronograph could not be heard. 
Experiment D was actually carried out in the week previous to C; before starting, 
a few trials at estimating 10 seconds had been made with a watch, but these were 
not repeated after the commencement. Again, some 10 second counts were made 
with a watch before starting on C, but no comparison with a watch or clock was 
made during the course of the experiment. The measuring up of C and D was 
left until both experiments were completed, so that the chance of some bias 
to the judgment, which occurred in the case of A and B was avoided. 

(c) Experiment E. 

This consists of nine series of readings made with a Zeiss Comparator at the 
Solar Physics Observatory, Cambridge, on photographic plates of the spectrum of 
Nova Aquilae III. The readings were taken in the first place in order to calculate 
the Probable Errors of the measurements of certain types of structure featuring in 
the broad emission bands, and each series consists of readings taken from 51 
consecutive settings on a particular marking, either a maximum or the edge of a 
maximum. Although the number of readings is not sufficient for any great 
weight to be attached to the results, they are, I think, of sufficient interest to be 
included. In the instrument used, the plate to be measured is fixed to a slide, 








32 On the Variations in Personal Equation 


which is moved horizontally in a greased slot by pressure with the hand; the 
measurer looks through one eyepiece and pushes the slide until the feature on the 
plate of which he is wishing to measure the position, comes under a cross wire in 
the focus of the eyepiece; then looking through a second eyepiece at the scale 
attached to the slide, he takes the reading, the last two figures of which are read 
from a graduated wheel attached toa micrometer screw-head. In making a measure- 
ment there are therefore two adjustments : 

(1) The setting of the marking in the plate under the cross wire in the first 
eyepiece. 

(2) The shifting of two very close parallel wires by a micrometer screw in the 
second eyepiece, until a line of division on the scale appears to lie exactly in the 
centre between them. 

Far the greater source of error arises from the first setting, particularly if the 
marking on the plate is not clear cut. In taking a series of measurements, the 
observer should always move the slide from the same direction—that is he should 
always push it or always pull it, until he thinks that the marking is bisected or 
“edged” by the cross wire, and then he should stop; if he obviously overshoots 
the mark he should start again, and not hesitatingly move the slide backwards 
and forwards in search of what he thinks may be the best setting. By shifting the 
slide into position from the same direction, the measures may be all subject to a 
fairly constant personal equation due to “over push” or “under push,” “over 
pull” or “under pull” of the slide, but this effect may be eliminated by reversing 
the plate in the instrument, making a fresh series of measures, and taking the 
mean of the two. In this particular set of readings the slide was always “ pulled ” 
into its final position. 

(d) It is hoped that the results of some further experiments of a different type 
in estimating length which were kindly undertaken for me by Mr E. A. Milne of 
Trinity College, and Mr L. J. Comrie of St John’s College, Cambridge, will be 
included in a future paper. 


IV. TERMINOLOGY. 


Experiments A, B, C and D were arranged in accordance with a uniform 
scheme, each Experiment being divided into 20 “series” consisting of 63 obser- 
vations. The series will be designated by the Roman numerals I, II...XX in the 
order in which they were carried out, and the 63 observations* in a series by the 
letters 

Has Bas +> Me «0+ Sess 

In dealing with each Experiment one of the first objects will be to ascertain 
whether there is any correlation between successive judgments, and the manner in 
which this correlation, if existent, falls off as the interval between the judgments 
correlated is increased. To obtain these coefficients of correlation it is necessary 


* The first 7 observations, see footnote, p. 28, being always disregarded. 








Econ S. PEARSON 33 


to divide the observations of each series into “groups,” and thus we have the 
50 observations 


= a form Group 1 with mean d, and standard deviation o,, 
Yo ’ Ys, cee Yn ”? ” 2 ”» ” d, ” ” ” G2, 
a5 ’ UYkH> tee Yx0+k-1 ” ” k ” » d; ” » ” ok, 
Yay Ys» eee Yes ” ” 14 ” ” di ” ” ” O14: 


By “the correlation of successive judgments at intervals of one,” I shall under- 
stand the correlation of the 50 observations of Group 1 of a series with the 50 
corresponding observations of Group 2 of that series; this will be expressed 
as p,. Similarly “the correlation of successive judgments at intervals of k,” or p,x, 
is the correlation of the corresponding observations in Groups 1 and k + 1. In fact 
px 18 given by 
a7 ~ dd 
50 1-1 WeYtsk — Gee 





Oy. OR . ; 

When these constants are to be referred to some particular series, say the 
pth, the prefix p will be placed before them, e.g. po,, ,ox, pp, ete. 

A comparison of the d’s, o’s and p’s of the different series will be instructive, 
but as each of these constants has been calculated from 50 observations only, to 
obtain quantities with smaller probable errors we must combine the observations 
of the 20 series. Thus we shall obtain 


. « 3 
D, = —- eat ip Yt+k—- 1 = ee = (di) -. seer eee eeeees eerste .(v), 


mn mt=1 
where n = 50, the number in a group, 
m = 20, the number of series, 


and & indicates summation for all 20 series. 
m 


1 
Pp=—2  HeYesk — D, Dyes 


UN m t=1 


= (PEO CK +d 1My41) — D, Dyess. 


m m 
P,= ae > (peo, Or) + 3 >(D, - —d, \ (Deas - dys) in view of (v) ...(V1). 


Putting /=0, in (vi) we have as the square of the standard deviation 


S?. = = Sin: 4 eee 7 (D - hh 
m m m ss 
and similarly siesceneeginwaha (vii), 
i a : 
S41 =—> (a 741) +—2 (Des = des.) 
Mm m m m 


Biometrika x1v 3 








34 On the Variations in Personal Equation 


and finally the coefficient of correlation Ry is given by 


Si Pheer (vill). 
D, and S; are the mean and standard deviation of the combined observations— 
1000 in all—of the 20 Groups k, while R; is the correlation between the 1000 
observations in the 20 Groups 1 and the corresponding 1000 observations in 
the 20 Groups k+1, where it must be remembered that owing to the break 
between each series the 50th observation in Series I is correlated with the 
(50 +)th observation in that series, and not with the kth in Series II, ete. 

It will be seen from the equations (vi) and (viii) that it is possible for R, 
to have a large value even though the coefficients of correlation of successive 
judgments for the separate series are negligible. For though © (pzo,o%4:) may be 

m 
zero for k >p,let us say, where p may perhaps be 3 or 4, it is clear that the co- 
efficients for the combined series, R;, will not vanish as k increases unless 
= (D, — d;) (Dyess i dy+1) » 

si mace 
In fact if Z, (and therefore R,) does not vanish for values of / for which the 
ps of the individual series vanish, this is a sign of the existence of a secular 
change running through the series; the means of the separate series differ 
significantly from the mean of the combined 1000 observations, that is to say they 


differ significantly from each other. Now it is important to obtain a measure of 


the correlation of successive judgments, when freed from this secular term. First 


I define S; by the relation ey 
Se 
S = b 2 Sg RRP ee dae merece top ye x 
k ee = (ox ) (ix), 


(m = 20, = indicating summation for the 20 series); it is the standard deviation 


m 
of the 1000 observations in the combined Groups & after the secular change has 
been removed. Then R,’ is given by 
] 
~ = (ppeoyo, 
m PEO ka) 


De wanes oc... hie ee eee x 
. ig -oe (x), 


this is the correlation of successive judgments freed from secular change; before 
correlating the observations we are in fact fitting the series means together, by 
subtracting ,d,— D, from the observations of the 1st Group of Series I, ,d,—D, 
from the 2nd Group and so on, and again subtracting ,d,,, — D,,, from the obser- 
vations of the (4 + 1)th Group of Series I, ete. 

Again it may be desirable to examine the residuals after a sessional change 
has been removed from the observations of each series, in addition to the general 
secular term. Suppose that an observation in the pth Series can be expressed in 
the form introduced on page 25 


pyt = (T») +f, (t) + Yt eee eee cereeccccces 




















Eaon S. PEARSON 35 


where ¢(t,) represents the secular term which we take as constant for all the 
observations of the pth Series, and f, (¢) gives the sessional change, then S,” will 
be the standard deviation of the 1000 residuals in the twenty 1st Groups, S,” of 
the 1000 residuals in the twenty kth Groups, etc., so that 


—_—_—_____— 
YI : ~ To ° 
Sk = —> > (1 2 p-1) Cees cere erences reees (x1), 
MN m t=1 
the mean of the residuals being zero, and m = 20, n = 50 again; while the corre- 
lation of the successive residuals at intervals of k, after the removal of secular and 


sessional terms, or R,” will be given by 
2 
—— = > (Vi Vise) 
MN m t=1 
Ds Sb 





R,” 


TABLE OF CONSTANTS. 

In the following table definitions are given of the most important of the 
constants referred to in the preceding section and of others to be introduced 
in the sequel. 

1. The kth Group of the pth Series consists of the 50 observations 

DY kes DY Wis eee eee eeees pV k-+50—1° 

As each Series consists of 63 observations, there are 14 Groups in each of the 

20 Series, 
n will often be used for 50, the number of observations in a Group, 
m 12 < 20 f - Series. 
2. The crude Observations. 
(a) For the pth Series. 
d = mean of the whole 63 observations. 


pd; = mean of observations in kth Group. 


lI 


poy, = Standard deviation of observations in kth Group. 

ppk = coefficient of correlation between corresponding observations of Groups 1 
and k +1, ie. between py, and pyr p> ANd pYe+2, ete. 

pos = Standard deviation of the first forward differences of the observations in 
Group 1, Le. of pY2— pI, ps — pio «++ pYn — ps0 

: n+ 1 : ‘ ; 
pb =slope of the straight line y—,d, = ,b (¢ a ) which fits “best” the 

50 observations p21, pes +++ pts «++ pYn Of Group 1. 

yp? = standard deviation of residuals left after the ordinates of this “best” 
fitting straight line have been subtracted from the observations of Group k. 

pPk = coefficient of correlation between these residuals of Group 1 and Group 
k+1. 


3—2 








36 On the Variations in Personal Equation 


In the reduction of the results of the experiments, unless it is necessary to 
specify a particular series, the prefix p before these constants will usually be 
omitted for brevity. 

(b) For the combined 20 series. 

D = mean of the whole 1260 (= 20 x 63) observations of an experiment. 

D, = mean of the 1000 observations in the combined kth Groups of the 
20 series. 

S;, = standard deviation of the 1000 observations in the combined kth Group 
of the 20 series. 

R, = coefficient of correlation between the 1000 observations in the Ist Groups 
and the 1000 corresponding observations in the / + 1th Groups. 

-R, = coefficient of correlation between the 1000 sth forward differences of the 
observations in the Ist Groups and the corresponding differences of the obser- 
vations in the & + 1th Groups. 

S; =standard deviation of the 1600 first forward differences of the obser- 
vations in the 1st Groups. 

3. The Observations freed from the Secular Change. 

The “secular term” in the observation ,y, considered as a member of the kth 
Group is ,d;. Thus the mean of the 1000 observations in the kth Groups each 
freed from its secular term will be zero. 

S,’ = standard deviation of the 1000 observations (freed from secular term) in 
the kth Groups. 

R,, = coefficient of correlation between the 1000 observations in the 1st Groups 
and the 1000 corresponding observations in the & + lth Groups (all freed from 
secular term). 

4, The Observations freed from both Secular and Sessional Change. 

y =f,(@) is the curve representing the sessional change in the pth Series, 
so that f, (t) is the “sessional term” in py, the tth observation in the pth Series. 
pY1+=the residual left after removing the secular and sessional terms from py. 

S;” = standard deviation of the 1000 Y’s in the kth Groups. 

R, = coefficient of correlation between the 1000 Y’s in the 1st Groups and 
the corresponding 1000 Y’s in the & + 1th Groups. 

p% =the part of ,Y; representing the actual estimate which the observer 
wishes to record. 

pS: = the part of »Y; representing a complex of accidental errors superimposed 
on ,a in the process of record. 

G,, = standard deviation of the sessional terms in the 1000 observations of the 
kth Groups. : 

F, = 1st order product moment coefticient about the mean of these sessional 
terms in the Ist Groups and the corresponding terms in the / + 1th Groups. - 





———————— eee 








EE SE 


Eo 





Eaon S. Pearson 37 


V. On Meruops or REpwcTION. 

(a) Variate Difference Correlation. 

It will become evident in the detailed discussion of the results of the experi- 
ments, that a considerable part of the correlation of the successive judgments 
is due to a secular change with time, occurring from series to series, and in the 
case of the Trisections, to a sessional change as well occurring within the series ; 
I therefore propose to consider at this point how far the Variate Difference Corre- 
lation Method is applicable in this type of problem, and to do this will approach 
the matter from a slightly more general point of view than that of “Student” in 
Biometrika, Vol. xX. p. 179. 

Suppose that # and yare the two variables to be correlated, with corresponding 
values 

Gig: Mic +5 Hs Rae 
Yrs Yrs =-= Yor vee Yoursy 
and that we may express a, and y; in the form 
“A= F, (t) + Xt, 
w= Fi(t)+ VY, 
where F,(¢) and F,(#) are polynomials of degree n in ¢, the unit of ¢ being the 
interval of time or space between the successive values of the variates, which is 
supposed equal and constant ; X; and Y; are independent of the secular or sessional 
change represented by F, and F,. 
Let us now obtain a general expression for 


(1) ry, 2,; A,y, 8 nf, the correlation of the nth forward differences of and y. 


(2) TA, Xi; An Y; or nl’ ” ” ” Xx; ” Y;. 
Now 
n! ; oe 
Ana = (1 — €)" Gat = nse — NXnze-1 --. (— 1) s!(n =o) Lnst—g ++» (— 1)" %...(xiil), 


where the operator ¢ is defined by ea, = a_s, ete. 


Further we must assume that 
* 
(a) > a4, = constant for all values of A small compared with », 
t=1 
= 0, by suitable choice of origin, 
v 


> Yt+h = 0, 
t=1 


v v 
from which it follows that } Aj,a4,=O0= Y An Yer, 
t=1 t=1 


p ' 
(b) > (#%4.) = constant = ve,’ for all values of i small compared with 2, 
t= 
© 


> (Yt+n) = vo," » ” » » » 
t=1 


= 








38 On the Variations in Personal Equation | 


v 


(c) & (atsn Xt4n+4) = VX xpeo2’ for all values of h small compared with 2, 


t=1 

* 

= ae 2 

= (Yttn Ytrntk) =U X ypery . ‘ ss “4 
t=1 

0 

> (Vtsngk Yt+n) =U*X xyPkFaTy ” ” ” ” 
t=1 


Similar relations will hold for the residuals X and Y. 
Then a little consideration shews that the sum of the coefficients of the 
products of the #’s and y’s whose indices differ by p in the expression 
Ana Any or (1—€)" &ni¢ (1 — €)" Yngt 
is the coefficient of v x g,pporo, in the product moment 


v 


= Aya. Any; call this coefficient a,. 
t=1 





y 


Now e” operating on «4; gives x, see 
” 


” » Yntt oo» Yntt-r 

and if (n+t—r)—(n+t—v1’)=p, then r’—r=p; hence a, is the sum of the 
coefficients of the products ¢,” «,” in the expansion of (1—«¢)"(1—e.)”" for which 
r —1 = p, or the coefficient of e? in 


(1- =)" (= 6 





or of e+? in (—1)"(1 — «)*", ; 
so that a,= (- 1)" ee ste 5, 
(n+p)!(n—p)! 
Hence finally writing 7 =» + p we have 
v 2 ! 
: 2 Ana Anyt = oxy = (— 1)" Qn ar jayPj—n veers .».(X1V), F 
where negative values of the subscript of p imply that the subscript of « is less 
than that of y; e.g. »,p_p is the correlation between a and y+». ; 


Similarly for the standard deviations of the nth differences 


13 (A = be 2n! 

= 2 “y= 22 (— A ee ee ate. : : 

Patt n v4) Ox re, ) (Qn 9) 19 1 @Pj n (xv), 
lS < 2n! 

- % 2=¢,2 > (—1)4 —~ e 

Bye (Any)? = oy as 1) On =a tjiyinm scovesicen cER 


and for the correlation between the differences 


=, 1" 2n! 
—_ at = 
(= 1") 5 eybjnn | 
as j=0 (2n —7)!7! 
4 





( 2n on! = ) Qn Dn I ) ...(XV1l1). 
By “2 1)" ah ae S 1) tj = : = ee 
V{3, : (2n—j7)!7! #P} j =. ) (2n —)) 7! yPj j 


The correlation of the nth forward differences of the residuals X, and Y; or ,,R’ 
will equal an exactly similar expression to the last, in which yyp, xp and yp are 

















Keon S. PEAarson 39 


substituted for .,p,p and yp. But as F,(t) and F,(t) are polynomials of degree 
n in t, we know that é 

Ant = A,X: + constant} 

Any: = 4, Y; + constant) ’ 
and therefore 

ah = Tat An Yt sae ON sae oR es nl’, 

that is to say we may equate ,2 to an expression similar to that on the right hand 
side of (xvii) above, except that the correlation coefficients of the residuals, namely: 
xyp, xp and yp are to be substituted for ,,p, ,p and ,p. 

Now in the usual problem to which the Variate Difference Method is applied 
it is assumed that after taking a sufficient number of differences we shall approach 
a state in which the corresponding values of X; and Y;, the residuals left after 
the ordinates of an nth order parabola have been subtracted from a, and y, are 
mutually at random in time or space; or that 

XYPp = 0, xPp = 0, yPp >= 0, 
for all values of p other than zero, and that 
xPo=1=ypo, xYPo =? xy> 
ie. the correlation between X, and Y;. Upon this assumption it follows at once 
from the modified form of (xvii) that 
nit = xyPo OF Ty x Avy, — "XY? 
the fundamental relation of the original Variate Difference Correlation Method. 

Let us now turn to the particular type of problem in which we wish to corre- 
late the successive values of the same variate. If we are correlating the values at 
intervals of k, we shall have as corresponding variables, not a and y but 
y% and yx 80 that 

ayPj—n becomes pj+4—n aNd yypj-n may be written oe 

aPj—n ” Pi-n —»— XPj=n ” ” pj-n™ 7, 

yPj—n ” Pj-n » YPj-n ” ” pj-n” 
where as in the notation of page 35 p, is the correlation of successive values of 
the variate at intervals of p, and p,™ the correlation of successive residuals 
(at intervals of p) which are left after the subtraction of the ordinates of an 
nth order parabola representing the secular change. Hence we have from equation 
(xvii) that ,,R,, or the correlation between the nth forward differences of y, and yx 
is given by 








_ 2n! 
2 (— 1)" BaF Peto 
R _ J=0 an D:D: avi 
rk, => 7 Eee (xviii), 
v (— Lyn = p; 
ee 5 a, jin 
j=0 (2n —9)!7! 
— 2n! 
2 aoe 
—_ = 1 (xix), 
< . Zn: 
+S £ n+j ~~ ea 
i 1) : 7 =) Pj—n 
j-0 (2n—j)! gt"? 








40 On the Variations in Personal Equation 


where negative values of the subscript of p and p™ are to be treated as positive : 
eg. if k=1,n=5, j=1, then pxij-n = p—s = ps. 

We are again supposing that this secular change can be represented by y = f(¢), 
a polynomial of degree n in t, but we cannot expect that after removing a parabola 
of even 5th or 6th order*, the residuals Y,, Y2,... Y;,... Y, will be mutually at 
random in time or space; if we anticipate correlation between Y, and Y;,x, 
we must also be prepared for correlation between Y; and Y;,,_,, and in any case 
the correlation between Y;, and Y; or p’x4j-n where 7=n—k, will be unity. 


Hence we cannot make the assumptions of the first problem (that yyp, = 0, etc.), 
in fact 


Ts Y,A,Y4, 18 not equal to ry y,,.. 


Now consider the use which may be made of equations (xviii) and (xix). If 
the values of the p,’s have been calculated from the crude values of the variate, 
the quickest method of finding the correlations of differences ,R; is not by direct 
calculation but by putting these known values of the p,’s into the right hand side 
of (xviii). Then using (xix) we have a number of equations connecting the 
pp”’s, and the question that at once arises is whether there are sufficient 
equations to determine these coefficients? It will be seen at once that there 
cannot be; if we are proceeding to nth differences, we can obtain qg equations by 
putting k=1, 2,...q, but these will contain coefficients p,, to pry,” ; in fact 
% more equations are required. By using the appropriate equations for the 
Product Moments and for the Standard Deviation of nth differences corresponding 
to (xiv), (xv) and (xvi) we could obtain one further equation, but at the same 
time we introduce one further unknown, the standard deviation of the residuals. 

That these equations will be indeterminate, can be seen from another stand- 
point; the rth difference correlation equations (xviii) and (xix) will be satisfied 
not only by the p,’s and p,™’s as defined above, but by the correlation of the 
residuals left after the ordinates of a parabola of any order less than n, have been 
subtracted from the crude observations. Nor can further equations for the 
pp’s, be obtained by proceeding to n+1, or higher differences; the further 
relations obtained will not be independent, for example 
—1+2 nh, = nh, ete 

2(1—,R,) 


The possible application of these difference correlation equations is considered 
in the next section. 


n+l, ai) 


(b) The Application of the Results of the preceding Section. 

‘Although the correlation of differences does not appear to provide a general 
method for obtaining the correlation of successive values of a variate after secular 
changes have been removed, the equations (xviii) and (xix) will be found of con- 
siderable assistance in certain cases. 


* The figures will probably not warrant the taking of differences of much higher orders than 
5th or 6th. 


























Econ S. PEARSON 41 


The results of the analysis given in the three illustrative problems below will 
be used in obtaining the values of various constants in the reduction of the 
experiments in the later sections. It seemed desirable to collect the algebra 
together in this way, but in reading this paper the reader may find it more 
convenient to pass on and refer back to the theory when occasion arises for the 
numerical application of the results. 


Problem 1. In this and the following illustrations of the method of the 
preceding section, the notation of Section IV for the correlation of judgment will 
be used. 

I shall suppose that we have m series of observations through the course ot 
which there is some form of secular change; the means of the different series, or 
the values of ,d, varying considerably. The coefficients of correlation for the 
combined series, R,, Ro,... Ry,...R; have been calculated, and also the single 
coefficient R,’, the correlation of the successive values of the observations (at 
intervals of 1) after the series means have been fitted together—.e. after removal 
of secular change. 

It is clear that A,y, = 4, Y;', where y;= d,+ Y;’, within any one series, and 

n nr 
= Pad (Avy Aryere) = = 2 (A, Y’,. AL Y'144) ete., 
ate wn t= 


where } again stands for summation for the m series, so that the 1st difference 


m 


correlation equations (xviii) and (xix) are applicable, and become 


—1+4+2R,—R, — Ry, + 2R;, -— Ry 
Se ee EE 7 = r= 2 -1 eee 
e 4 2(1-R) Ry 2(1—R,) k=2, tos (xx), 
~~ 1+2R/-R/ _—Riit+2?Ry-Rin 7 _ ; 
37 -R) = - 3(1-R,) k=2,to s—1 ...(xxi). 


From (xx) we get the values of ,R,,k=1,2...s—1, and using these and value 
of R,’ already supposed to be known, the s— 1 equations (xxi) will give the s—1 
unknowns R,’,... R,’. 

The accuracy of this method will of course depend on the errors involved in 
the assumptions (a), (b), and (c) of page 37 above. 

Problem 2. To obtain the coefficients of correlation of the successive residuals 
left after the ordinates of the “best” fitting straight lines have been subtracted 
from each of m series of observations, that is, after the removal of a linear sessional 
change as well as a secular change. In the notation of p. 35 these coefficients 
may therefore be written ee . 

R.”, By,” ... Rel” ..:. 

In the first place let us obtain the constants of the straight line “best ” fitting 
the 50 observations of Group 1 of a series; this can be done by the method of 
Least Squares. 

If for any series the equation to the line is 


1 


(n = 50 as before) ............ (xxii), 














42 On the Variations in Personal Equation 





where the ¢th observation is 








2 
n a 
we have that K=>3 Ye 
t=1 
4 nm+1\)? . wie 
=> \y —d—b (¢ om ss )} , is to be a minimum, } 
t=1 -~ 
therefore id =0 and = = 0, 
or eae = (0 whence Es Y= nd, 
t=1 t= 
n =) 
ona 3 ly “ee o/ = dt ( (:- ok fe )=0o 
t=1 
es z n+1 : z i 
giving y 4% (+ - —% : =b > {@-(n+1)t+4(m4 1%. I 
t=1 2 t=1 
Or, the first order product moment coefficient about the mean of y; and t 
. ~~ ft) \ 
| ighte | 
giving for the constants of the best fitting line 
n 
d= d, =- Yt 


= Ge pP 
The next step is to obtain the correlation of the successive residuals left after 
the ordinates of this line have been subtracted from the observations. 
We shall have that 


men eso (Pp) a} fese(een tf) rr 


— 
bo 
a 


— nd (d + tet) 


> = (yet yrs) — nd +b S \(t+ fms = 1) Gnd) 
=1 2 


_ oF a ln -"e\(e-2 > 
Bee 5 +1) a)| ve > (1 5) (t >) 


+ > Y, Vin — nd? -—d (Yau-) 
t=1 
n—-1 
= 2 7 Visi tb 2npy + iicse ss d) + 3 (4% — @) = Yas +i 


° Lad ( o =i) 
ahs je. 
b> ic nt re j 


t=1 - 





renee * 1 ei i. 
> Y.Y¥ini+6 es n+"5 yous 0 aa ) 
t=1 2 


n ha 7 
=> aes 3" (n?—-1)+b? tS \ rt . Yous — nd 
t=1 2 2 


























Egon S. PEARSON 43 


and if p,’ be the correlation of the successive residuals and a,’ and o,' the corre- 
sponding standard deviations in Groups 1 and 2, we have finally 





pi oy oe! = poi ey — “ (n?—1)— = {(n +1) y+ (m—1) Yagi — 2nd} ...(xxiii). 


Similarly we have 


no? = > a+ b (¢ - ) - vi — nd? 


ores \(t- =") y- at —o 3 ( -"3 +r 





=> Ye + 2bnp, — g n(n? — 1) 


i y242 as n2—1 

| a ae 

\ whence it follows that 

i ol cha ai a 
Waal Snes (OP AD) 5. ick ceeseseraas ssouseseee (xxiv). 


And again, 


no? = 3 {a +b (t +1- “*) + Prnf —n (a +b+ a) 


Fas 2 


oe ye eee 


= 2d S yyy — 2nd? +2 \(¢ es pig tt) (yee -a)}—6 S(t ane y 
t=1 t=1 fut 2 
Y, 


+ > Y%4—n - = ‘- nb? — 2nbd — 2 (b + d) (Ynusr — Y1 — nb) 
—l1 
= no," + 2bnp, + nb? — b? = S (#-(n- t+ 4 a 


t=1 





+25 ("5* n—1l ) 


Yt 5s Yn — nd 
= No.” + _ n(n?—1) +b {(n+1)y,+(n—1) Ynys — 2nd} 
Cae b 
Oy" = 08 — 75 (?—1)—- "5 {(m +1) yt (M— 1) Yngr — 2nd} ....... eee (xxv). 


} If the values of p,’ have been calculated by this means for each of the m series, 
we shall have for the combined series, 


= (pi a,’ 02’) 





” m 
= Reece cece ec eeeeeceeeeees (xxvi) 
: ¥ (o,2) 3 (,”) 
m m 


a modified form of equation (xii). 


As we are subtracting the ordinates of a different straight line from each 
series, a modification of the first-difference equations may be necessary. The 











44 On the Variations in Personal Equation 


1st order product moment coefficient, for the m combined series*, of successive 
first differences at intervals of k& is given by 


1 = Yn Yk kin 
qt oe Pad (Yt — Yer) (Yerk — Yerkes) — me me =. = | {3 Ni san 


n 


=~ 


1 = 
= —- S3(Y, Sires ali as 


MN in t=1 
“fg Co Bf eo Tosti) 
m nh ) 
1 y 


Y,- Yn Y Y n 
cs = 3 (Ye — Vea) (Vie — Verens) - {x ce at ie kta 4 k+ sf 


mn pret x 1 m nm 








= i ys (0 Yi — Yat + Yep — Yates) 
mM mp n 


nm nto m 


P, = (- Ps + 2R,” — Ry.1’) Ss’? — Oi. io b? eeeeeeees (xxvii), 


ts Yn Yat + Ye Stes 


)+ fen aie m) pe mn 
: 


Or finally, 


making the assumptions (a), (b), and (c) of p. 37, and where 


1. a= zs ls x (0 Yi — Yar + Yer Yosbit) 2s .: a {3 "wa — Yn - + Yeti — ~ Ynrk+i} 


— eit f . 
Mn n m Mm mn 


2. Je is the standard deviation of the b’s. 


There will be similar corrected expressions for the standard deviations of the 
combined first differences. 


If we are justified in neglecting terms of the order of Q; + 6*, we may use the 
first difference equations, 


R _— Reuit 2R, — Ris 


re 2(1-R 
° aera ee (xxviil), 
—_~ Rea + =e. ‘= Rin’ << ee 
2(1 RB,’ ; ~ oe 


where, as in Problem 1, the known R,’s will give the ,R,’s, and it will only be 
necessary to calculate directly the one quantity R,”, in order to obtain 


R,”, . oo 


Problem 3. In the last illustration it may happen that while Q, +0? is so 
small as to cause only a negligible error in the value of R,” found from 


—1+ 2R,” —R.” 


R, = 2(1—R,”) > 


* b is the slope of best fitting line in the pth Series. 














a 


Eaon S. PEARSON 45 


the cumulative effect of this error may be considerable in the value found for 
R,” (s = 12, say). If then we take second differences 
7 


1 X 
oP, = — XX (Yt — Wess + Yrso) (Yere — Werks + Yerere) 


MUN m t=1 


a. {3 Yi — Yo — Ynui + tn {2 Ses Hee Yaenes F Yhsnes) 





m !mn n x n 


1 n 
= mn > Pal ( Y; i 2 Via + Vi42) ( Vink _ 2 ) (ieee + Ye+ese) 





~ fy Yar Yor Vout Vaud fy Viou= Vasa~ Foon Vane 
me? in n Ls n ae 

= (Ry_.” — 4Ry_.” + GR,” — 4Ry,..” + Ress”) 8”, 
and is independent of the differing values of the b’s. 

The appropriate equations are in fact of type, 

Ry = Ren es fe a (xxix) 

: 2 (3 hid 4k,” “5 R,.”) : 
for k=1, 2,3...s—2, where R_,” =R,” ete. and R,”=1. Then using the known 
value of R,”, and that of R,”, found as in Problem 2 from the first difference 
equation, these s — 2 equations will give the s—2 unknowns R,” ... R,”. 

It is clear that similar methods could be applied in the case of sessional changes 
of higher order, but I have taken the algebra in these three Problems, as the 
results will be used in the reduction of the experiments later on. The general 
explanation and equations may have appeared long, but the actual calculation in 
any particular case of such quantities as ,R,, ,R.,... Ry, or .R,,... oR,, and then of 

»,... Ry, and R,”,... R,”, is exceedingly simple, and far shorter than a direct 
calculation from the crude figures would be. In two cases the correlations were 
calculated both by the difference correlation method and directly without approxi- 
mation, and the agreement of the former results with the latter established con- 
fidence in this method of approximation. 


VI. ExperiMENT A (TRISECTION). REDUCTION OF OBSERVATIONS. 

(a) The individual Series. 

The observations of this Experiment have been reduced in more detail than in 
the other cases; the values of pz, k =1, 2,... 13, were found separately for each 
series, and these and the values of d@ and o—the means and standard deviations 
of the Groups—are given in Tables I, II and III. Several points of interest will be 
noted ; in the first place the observations have a marked tendency to decrease (i.e. 
for the estimate of a third to become smaller) both in the course of a series (as is 
seen by the general decrease of d, as k increases) and also in passing from the 
earlier to the later series. These are examples of what have been termed Sessional 
and Secular Changes. These changes are illustrated in Figure 3 where the centres 
of the circles give the values of d, for each Series, the length of the dotted lines 
from either side of these points-representing the standard deviations o,, and the 


















































‘¢ “SI 
HS@z 
HOC-Z 
4 
Ss 
= Use-z 
8 
~~} | 
> 
§ ; Hor-z 
~S> 
S 
Ss sre & 
2 ; | z 
L , i ‘ 1 > 
Qy : | i \ 8 ' / P| 
eS pe are ie ied tec Derlys Sahay Sei ea  iaee Seach eS 0S-z 
3 ! 3 we gat . 
3 ! Hes-z & 
S + : ioe ae ae é 
$s 409-2 
2 : ; \ : | 
Ss \ co-2 
@ ; 
S ; i | 
f “S3NI] ONILOSSIY | NI 4 OL 
NOILYNOF WNOSY3q JO NOILLNSIYLSIG 
\ H gz-z 
co 1 I T I I —a me I I T I ~ - median — I I I oe 7 03-2 
~ oc ch o 2 oo #S tw oc + OF 6 3 7 9 S v £ z n 


yu3adY"O NI S3id3S 40 30W1d 








mmr Oe a 








Eaon S. PEARSON 47 


continuous lines through the points representing the “best” fitting straight lines 
for the 50 observations of Group 1; the slopes of these last lines, or constants ,b, 
have been calculated by the Least Square method as in Problem 2, p. 41, and their 
values are given in the 8rd column of Table IV. 


Another way of examining the sessional change, and of obtaining a typical 
representation of it, is to calculate the average values for the 20 series of y, the 
tth observation in a series; thus 


te 1 1 ‘ 
Ye= a 2H = = G+ Yi) 


where ,d stands for the mean of the pth series (63 observations) as opposed to pdx, 
the mean of a particular Group & of that series. 

The values of Y; represent the sessional variation in any series about the 
mean of that series or session of observations, and the sequence mn —D, t=1, 
2, 3, ... 63, will clearly represent the mean sessional change. The values of % are 
given at the end of Table II and have been plotted in Figure 4, where they have 
been fitted with the second order parabola (calculated by least squares) 

y = "486 + 00255t — ‘00001898 .............0. eee (xxx). 


ORDER OF CBSERVATION IN INDIVIDUAL SERIES 


' s 1o Ss 20 25 3o 35 40 45 50 55 60 63 " 
2:30 TTT TTCTTT TTT TT rrr rr rrrrr errr Terri rrrryr eri rrr rrrrr errr rr rrr irri LILI i 














ACTUAL THIRO 


ESTIMATE OF 3 OF LENGTH (MEAN OF 20 SERIES) 

















2-60 


TRISECTION EXPERIMENT. MEAN SESSIONAL CHANGE. 
Fig. 4. 








48 On the Variations in Personal Equation 


Figures 3 and 4 together show very clearly the marked sessional change ; while 
the former shows that except in a few series, notably Series I, IV and X, the 
regression is remarkably constant in its value, the latter indicates that the sessional 
change is better represented by a parabola than by a straight line. 

The sessional change can also be represented numerically with the help of the 
correlation ratio of y upon ¢t. If we are dealing with the observations freed from 
the secular change, that is after the removal of the means ,d from the 63 observa- 
tions of the pth series we have n,, given by 


63 = 
¥ (7 2 
akih— D) 3 
My? =——=poa— , Where S?= D> S ( Ye — ply 
Mt 63S” ° , 1260 mt=1 4s ¥3 


or S’ is the standard deviation of the whole 1269 observations after the removal 
of the secular term*. Then the ratio of the mean square distance of every observa- 
tion from the regression line or line of means #;, to the standard deviation of the 


i Pages 

. +S — 4%, 

s/f 1260 7, ~"— 
Ss’ 


where > indicates summation for the 20 series. 
m 


observations is 





This is a measure of the closeness of fit of the observations in a series to the 
mean sessional change as represented by the values %,; the larger ny, and therefore 
the smaller V1 — »,? is, the more nearly does a sessional change of the same form 
recur in series after series. A comparison of the values of V1 — 7,2 for the different 
experiments will show the relative significance of their mean sessional changes. 


In the present case the value of 7,, is found to be 579 + ‘013, while 
V1 — n,2 ='815. 


It would be an interesting problem to obtain the correlation of the successive 
residuals left after the ordinates of the “best” fitting parabola for each series had 
been subtracted from the observations of that series; but although this has not 
been done, a fair idea of the degree to which the correlation of the successive 
judgments in the individual series is due to the sessional change can be obtained 
by removing the “best” fitting straight lines from each series. The values caleu- 
lated for the ,b’s have been referred to above, and using these and the equations 
(xxii)—(xxiv) of pp. 41—43, the values of o,’ and p,’, or the standard deviations 
and correlations of successive observations freed from the linear sessional changes, 
have been calculated and are given in the 4th and 6th columns of Table IV. The 
p,’s are all less than the corresponding p,’s, except in Series X where they are 

* Actually it is only the values of the Group Standard Deviations S,’, S.’ ... S,4’ which have been 


calculated ; they are not all equal (as shown in Table V) owing to the sessional change in standard 
deviation, but an approximation to S’ sufficiently accurate for the purpose will be given by taking 


S?= ty {S,? + S,/2+ one F Si4/7}. 








——<—-~— 





—_ ee 


Pe 


on 


*ajouyooy gz “daeg , 
ws re a is “id oer quouspnl Jo splooad U9eM4Oq [VAAIJUT WRIT 


shI m9 (4 S[@tag Areuruntpead 4 ay3 Surpnpour) suoryeadrosqo OL JO Soles B OJ UOHVy OUITY UROTT 





P60. F 9160. 9980-F | 1080.F | GTLO.¥ | OL90-F | 98FO.F FeO. ‘a’ JO ONTVA 





OZ. 0g: 9 | OL 08 | ** d Jo onTea 
| 


‘saypruin 4 ay, fo sug OG Wout paynynaznd UuoYym—a.log Jo yuawYfaoy JO SLO“ 2790904 q 








UB 1ES-+ BOF + CETE.+ CTLE-+ COPG-+ L8FE-+ 18¢S-.+) LEZ SI. +| OPII-+ 9€¢%-+ ZL9%-+ o1ge.+| 6109-+| xx 
"Ue LEZO-— OOOT-+ LEGL-+ E6LI-+ OLEE-+) O8EF-+ CIF. +) O9CF- +) SFIG-+ Ez9¢.+! EE09.+) 616G-+| BIZL- | XIX 
‘ue OFFI-+ 8¢90-+ FEG0.—| FE80-—| ¢990-.+)| L690.+ €9FO-+| | FLOZ-— Z99L-—  9LEO.— | 6ZRO- +} or. 
; ‘Ue [C8 + E8ZF-+ L6G. +) OLEF-+) G6GE. +) OZ6E.+ OBES. +| n+] B9Fr + 90g. +) ZeLp- +) ggt9.+| 9¢e° +| 
froyeaatosqg] — ‘ure? 16E0-— 09G0.—  9900-—  ZFIL-+) GLZ1-+) FFGO. +) ZT8O. +| GGES-+| LOE + LOGE. +) FOLGE. + ZIP. Sid 
4B SULTAOU Ul SULMSvaUT 938, q ‘ 61 wd [F1%-+ [281+ 6IZI-+ OLOL-+| GFZI-+| OS6I-+ 0Z6Z-+ C99E-+, LLZ>. +) 6Z09-+) 96LL- + O96 38. +| 6998- +] 
Surpvea Youur woaz poaty sodgy yurd FEOF. + ELEE-+ 169%-+ IGLL-+) ELZL-+) IFIL+ 81z-+ gerz-+) 996Z.+  OLEF-+ SI09-+ ¢199.+| G96L- +) 
= rune =Z980.+, O99T-+, LOZ +) L16Z-+) FOZE-+| OLOE. +) LFZE-+  196Z-+]| 6OLE-+! L6EE-+ STOE-+ good. + +| 1689- +| 
_— sard ¢1[¢0.—| LOFI-— 69¢90.+! G90I--+) Z8FO-+) T190.+ 862%. +! 9E9Z- +) L16Z- +) GLIF- +) FEF +) LETS. +| O9E9. +) 
— (Ure ZELP-—| SZhr-—| LO&F.—| G FP8E- —| SENT. — H8OL-— | 620. +) €660-+ Se9I-+ [60E-+) PIS. +) 18eL-+ 
XI Ul Worneutyss | | | 
cL 4y S104 ‘id eimicaa atin ae furd €EFO-. +! LOLZ-+) GCGS- + | } ISEl- +} ¢190-+, LOFO. +) CELO-—| 8Z0-+ COET-+ O68E-+! Z6E9. +) IGTL-+ 
G © | cou pus pure O&FI-+| EGZI-+) 90GB. +! OF LE9E. +) 9OGE. + GGCE. +) FLOP. +) GSEE-+ BEFE-+| 96GE.+ SCOF-+| GLOL- + 
0 + | pear, | mp pp sued ELOe- + BOVE + 160F-+ ZORE-+ OFZE.+| SFIP-+ 099Z-+ GZ8E-+) ZOLE-+ OFEE-+ L16E.+ 6809- + 
sch 9 ? (ure [COS + SFET-+! OOZI-+ 9FLO-+| 98TO- +} GOGO. +) OLOI-+| GPIZ- +) 6PRZ-+ E6IZ-+) 189E-+ Lepg. + 
— Gare =GG66E-+ FIEh- +! SF8I-+ FL8L-+ €990-+) SEL0-+ O8FI-+)| O8GO-—| 990Z-+ ZLEL-+) 6FLI-+ GOFF- +| 06¢¢-+ 
: = ‘ure = G1ZI-— 9860-—  FI9T-—| ZFFO-— QLL0-—| LE00.— | OLGT- +! 960-+) G8FO.+ L69I-+! ZOZ-+)| GOES: +] FEZE-+ 
A10ywAdasgQ ¥V SI Z| 
—F1 a0g soquid Sutmsvout 1oqyyy LT Ure 10€0-+ GIET-— G0G0-+) Z890-+ OLG0-—| E8L0-+ ZG6Z-+| ELZO- | 8F60-+ ZFEO-—| 66E0-+ | 8016-+| 09F0-— x 
as QI We CFIC.+ L6EP-+ EC68-+) LIEE-+) 1ZZb-+) PFGZ-+ FHOST- +! F8ZS--+| SLLE-+ 186E-+ EegE.+) C90G.+) O9GE. 
— “Ue EOE. + OGLE. + LE1G-+! OILL-+ GHBI-+)| CL9Z- + [998--+) Lece.-+| L6eh-+ |EszF. +) LLCh. +) FETG.+) cere. +| 
= AvyT L "We = 1EZ0.— | 8E60-— | FELO-+! CLOI-+ ESTI-+| 1F00-+)| L0G0-+) [Z00--+| 6810-+ EI8I1-+! 9TFO-—| CEP. + | 008 + | 
| | | 


Z 
=) 
R 
0 
= 
= 

i 

D 
Z 
C) 
& 

a 








SYIVULEY (0z61) 93%q ; ; a 


Biometrika xiv 





‘SyLDULAY PUD sauny, ‘san ‘saruag agoundag 1of uoynjatlog fo 


‘NOISY, “TI WTAViL 

















| a) 


LOSP-Z = SUOTFVAAOSQO [TB JO URETY | 
8CEEO.— CPrlEed- ~ FLI6éG0- — 819Z0O- — | OLEZO- — LE610. — E69 LO. — COFLO- — GFELO. — OEOLO- — |CF800-. — FIDO. — E9FOO- — |LETZGO-. — SUvOTY 
































| | | | | 

= | &2h-% | €9 | 6FF.Z 2G | 1G | | 
-& GOP. | GO | 6EF- 0G GISI-—| 9671-—-| S6PI-—| OLFI-—| ‘9GFI-—| OSFI.—| FEPI-—| FOPI-—| OBEL—| OLEI-—| SPEI-—| O&EI-—| OSEI-—| FESI-—-| XX 
~ PLE. | 19 | OFF. 61 9ZLI-— 9691-—| F89I-—| SE9I-—| 809I-—| F9GI-—| SFCI-—| SEGI-—| OIGI-—| O9FI-—| SIFI-—| OEl-—| ZEET-—| OOET---| XIX | 
Ss IZg- | 09 | Fer SI GLGI-—| QOPZI-—-| GEZI-—| YOGI-—| FELI-—| OLIT-—| OFII-—| PHLT-—-| HHII.—| GELI-—| GFHII-—| GSLI-—| 98O1-—| 9S0T-—| ITIAX | 
4 6GE. 6¢ | GFF- LI SIGI-—| P8II-—| OLII-—| GHIT--| 9OLT-—| FFOT-—| OLOT-—| FOOT-—| 9260. —| 9&60-—| ZERO. —| 9E80-—| 8O8O-—} 9FZO-—} ITAX | 
icy SOP. 8¢ | 6FF- 91 960I-—| ZZOI-—-| 9IOT-—! G860-—| 9F60--| OI60. — OL80- —| ¥é80-—! OLLO-—| GELO- — | O690- --| SF9O- —| ZE9O-—|} O190--| TAX 
Fc 6LP- Lo | 69F- cl PSGI-—| PHPGI--| PHEZI-—| YIGI-—!| FSLI-—! PZII-—| OSOI-—| CEOI-—| 8960-—| OZ6O-—  Sego. —| FI8O. —| 09Z0-—! OLZO- > AX 
8 vEP- | 9° | 96F- FL | 89¢0--| GZc0.—| 98FO.—| G9FO-—| g6eO.—| FEEO-—| FLZO- — —' 8L10- -| 9E10-—| 8L00.—} ZG00. —| GZOO-—| 0000. | ATX 
= PEP. | SS | Sé6P- 2. | | O@¢0-—  OTgO. 24, O6FO.—| GStO-.—-| ZIFO.—| F8EO.—| 9GSEO.—| GEO. —| F8ZO-— | FYIZO- —| 8CZO.—| 99ZO. — | F9ZO. —| SFZO.—| IIIX 
S CGP. | FE | LSP. GL | | Z6IO-—| 9810-—| Z8lO.—! YFLO-—!| OFTO.—) O€10-—! FILO-—| 0100--—  OFOOD- - — F000-— | F000.+) FOOO-+) FLOO-+} IIX | 
3 PPP- €¢ | €o¢. Il | | ¥OPO-—-| SOFO.—| S8ZFO-—| OSFO-—| OLFO.—| F9FO-.—| OLFO-—| FEFO-—| 98EO- — — | 89€0-—| F880-—| FI7O-—| 8IFO--| IX 
& O9F- | Go | CLP. OL | | S8tO-+) FEcO.+| O190-+|) G990-+) ZELO.+| ZELO-+| 9ELO.+) OILO-+ OTLO-+ + £290-+ O8¢O.+! Z8¢0-+) O660-+| xX 

PEP. Ig | GLP. 6 VISI-—| 9I8I-—| 9O8I-—| FOSI-—| 98LZI-—| OSZI-—| OOLI-—| O9LI-—| FRLI- — —| 8OLI-—;| G99T-—| O€9I-—-| 96SI--| XI 
Ss ChP- OG | GSF- 8 | @810-- 9010-—| 8200-—) F200-—! gg00. —| 8Z00- +! OFOO. +) 9800. O@LO- + + O1ZO-+ Ff€20.+; OLZO-+) FIEO- +| IITA 
% PP. | 6P | Esp. L PIGI- +! OZI-+) O€ZI-+) PLEL-+| Z6ZI- +! OFPET- +) GOEL. +) C6EL-+ FSET. + + SLEI-+)| PLEI- +] O6EI-+| SPFI°+! IIA 
~4 LEP. | SE | ZSP- 9 86Z0--— $¢6Z0-—  90Z0.—! 8910-—-| ZZ10-—! Z900-—| O€00- -— | 0000. GL00-+, OFOO- + FL00-4+) 8L00-+ OLLO-+, ZELO.+ [A 
- COP. LP | 68F- | G ZSTO--| OO10-—, ZFOO-—| 9E00-+! 9200.+) SELO--+| FSO. +! ZEZO. +! 9CZO.+ Z9ZO.+. F8ZO-+ Z6ZO-+) 86ZO-+. SOKO. + A 
3 Ith | OF | eze. | 14 OFIO- +} O€10-+! 9GTO.+:> SLIO-+) SL10.+! 9020. +! OLZO. +) 90Z0-+ OLIO-+ 9Z1O-+) ZIIO. +) SIO. +) FZIO- +. FILO. + oa 
‘= Ich. | CF | cre. € FZ60-+| 9€60-+| Z960.+, GZOI-+| OOLI-+) SOTI-+) GIZI-+| OLZI-+) OGZI-+) SOZI-+ OSZI-+) ZIEI- +) OCEI- +) OGEI-+} IIT | 
BS 9GP- | PF | LES. | G PSPI-+) OIGI-+| GFCT-+| SG6GI-+| SFOI-+; SL9I-+) SILI-+) FOLI- +) 9Z8I-+ F8BI-+ O€6CI-+) 9661. +) ZEOS-+| 9EOE- +] IT 
K OCF-Z | Sh | FES-Z I GELI-+) FOIL-+ ZOSI-+| 9OZI-+] OIZI- +, OZGI- +! FESI-+| GZZL- +) SSLI- +) FSLT-+ OLII-+| QOZI-+) G6II- +) SETI. + I 
RS aa = sans hl ata se ip oammeaea \3 a j - * haa r eons 
~~ eS ae ie h 2 *h 2 FL dnory | ey, dnory gt dnory | TT dnorg |QT dnory | 6 dnory |g dnory | pdnoin gdnorg ¢dnoiy fF dnoiy ¢g dno |Z dnowy) Tdnory | soleg 
Ss } 
S aN aS Ses ee Sie — 


‘salag yona fo woyna ‘(sayour QOOG.Z Uebrwo wows) sdno.H sarwag fo suvayy 


“19890 «yt» 94} JO sonDA woaTy senbistiiiien: tee aitiands 


0 





“MOTO TA eQB], WoIy USyey Useq oavy 


TIUeL I} JO u10330q oy} 4" 


Igy JO Son[VA OIL], 





Zz 
=) 
RM 
63 
=< 
<>} 
am 
D 
Zz 
o 
ido] 
ea 


¢9060- 


OFFO- 
19€¢0- 
OF090- 
9€0L0- 
FE190- 
c6¢L0- 
9¢160- 
61Z80- 
€6990- 
98060- 
€PcLl- 
GOFOT- 
O€FTT- 
G8 LOl- 
6Z980- 
€9680- 
9@L90- 
S8Ecl- 
G96GT- 
LIGLO- 


éL060- 


LLPVO- 
og¢co- 
48090: 
GLOLO- 
é9190- 
[FELo- 
6¢960- 
69280: 
GEL90- 
EL 160- 
GL8I1- 
éLEOl- 
SLPFI- 
SLIOT- 
8¢E80- 
O0Z680- 
9¢890- 
9ESGT- 
EL8El- 
92690- 





OL060- 


LL¥FO- 
969¢0- 
L1Z90- 
L60L0- 
89T90- 
GLELO- 
OL¢60- 
OFZ80- 
€¢L90- 
6€760- 
L69LT- 
FLFOL- 
CILoFI- 
ISLOl- 
O9Z80- 
ZO980- 
60¢90- 
[8&éI- 
Po9EI- 
CE990- 








80680- 


6LEFO- 
90L¢C0- 
¢9090- 
1¢€Lo- 
66290. 
9ZZLO- 
SZF60- 
1Zg80- 
C6190: 
L&Z60- 
c9ZLI- 
6LF0L- 
9ECFI- 
LOFOL- 
PSL8o- 
9LE80.- 
F8090- 
68L1I- 
Foes 


GEL80- 


61LEFO.- 
6F690- 
-98¢0- 
PO9LO- 
L8¢oco. 
LOFLO- 
PFL8O.- 
68F80- 
88890- 
FGL60- 
OOFOT- 
98FOT. 
IéFFI- 
cO9ol.- 
SILLO. 
LII80. 
F8ogo. 
1¢80l. 
I9TéT. 


91°80. 


I6ZF0- 
LE190- 
ccgco. 
C6ZLO- 
IP 

O9ELO. 
19@80- 
GZFS8O- 
F1L690- 
9L160.- 
9ZEOL- 
SZcOol. 
91éFI- 
€ZSOl- 
O€ZLO- 
Z96L0- 
8P6CO0- 
OFS60.- 
SSIZI- 
€Pego. 


CPcso.- 


€EPErO- 
S6190- 
O6E¢0- 
IT¢L0- 
E81Co- 
COO80- 
L6180- 
9PFS8O- 
OZ690- 
[8160- 
8CZOl- 
LILOL- 
OLIFI- 
e960T- 
CE690- 
T99L0- 
TF6¢0.- 
9@¢60- 
O96LI- 
€9Z90- 


| CEC8O.- 
| 


OCEFO- 
CZE90- 
6LECO- 
LOGLO- 
LOLGO- 
GEESO- 
LOSLO- 
8LZso- 
8L690- 
6EZ60- 
O9E€0T- 
[LTLOL- 
IL8€T- 
S6ILT- 
66990- 
O€CLO- 
806¢0- 
8FC6O- 
LOLII- 
E8E90- 





€ECso- 


C8ero- 
ECC9O- 
6LECO- 
[¢9L0- 
F8090- 
L6680- 
LYLLO- 
LOZSO- 
11690: 
98060- 
CFFOL- 
e6LOI- 
ESSEl- 
FOTLI- 
06¢90- 
9CTLO- 
006E0- 
98¢60- 
6LOLT- 


éL990- 


ZOC8O0- 


LLEFO- 
6LC90- 
COECO- 
9LEL0- 
C9C90- 
FEEGO- 
[gLLo- 
eCLS8o- 
8Z0OLO0- 
89060: 
S9LOT- 
FOSOT- 
6ILET- 
9EOLT- 
FLZ90- 
€LTLo- 
CE6co- 
LZF6O- 
ELEOL- 
Z8990- 


FEL8O- 


LECFo- 
8L990- 
sLECO- 
9¢LL0. 
PL690- 
LZ960- 
8C9LO0- 
FETSo- 
G80L0- 
9Z060- 
OC6OL- 
LLLOI- 
SECEL- 
C6OLT- 
FOL9O- 
LOTLO- 
FZ090- 
90°60. 
26660. 


6¢190- 


66F80- 


9FFFO- 
16690. 
L¥EZCO. 
HESLO- 
C6ELO- 
96660. 
L69L0- 
86080- 
[LOLO.- 
FIS8o- 
62ZLI- 
O6FOL- 
Steel. 
€FOLt- 
L8090- 
[61LO- 
8Z8C0- 
FLE6O.- 
OLC6O- 
66¢90- 


SLFso- 


9FFFO- 
99890. 
L[Z8Ff0- 
LO¢SO.- 
L1¢LO- 
L8EOL- 
[96L0- 
9TLSO0- 
[LLOLO- 
OCFS8O- 
9GZELT- 
EFPOl- 
C6CEI- 
O6LLT- 
FR6CO- 
FOELO- 
LOgco. 
19C60- 
L6160- 
€LC90. 


OSF80- 


Errro- 
0Z690- 
PEOFO.- 
F6E80- 
6I8LO.- 
LLLOL- 
[IF L80.- 
FFZSO- 
COLTLO. 
CEFSO- 
[FLT 
€EZOL- 
O90ZI- 
FELL 
[Z6¢0. 
OLZLO- 
ZO06¢C0- 

L9C6O. 
FSL60- 
€6090- 





C9C90- | C6F90- 





| \ 
II dnory | OT dnory | 6 dnory | gdhory 


| 
Pn od ee 
ZI dno | g dnoin|F dnory | gdnory gdnorig [ dnoiy  sarieg | 


FI dnory | 1 dnoin 1 dnory | 9 dnory 
| 








‘(sayoun ur) sdnowy sarwag fo suoynuag punpunyy 


‘NOILOGSINT, “TIL WIAVL 








52 On the Variations in Personal Equation 


equal, but though there is in general a considerable reduction, it is clear that 
neither a linear sessional change nor a parabolic one of the form represented by 
equation (xxx) account for the greater part of the correlation of successive judg- 
ments. 

The coefticients p, and also p,’ vary considerably from series to series, but there 
is no very marked progressive secular change. On the whole both p, and p,’ are 
large when the standard deviation is large, and a measure of this correspondence 
will be given by the correlation of p and c. This can be obtained most readily, and 
with sufficient accuracy for the purpose, from the correlation of the ranks of these 
variates, by the method referred to in Biometrika, Vol. x. p. 416*. 

The results are 

correlation between p, and o, +°52+°‘1l =7r,,, 

> ) pi and o, + 60+ :'10=7',,. 
The difference is not significant, and we may draw the conclusion which could not 
have been assumed a priori, that the correlation of successive judgments is larger 
when the variations in judgment are larger, and that this relationship does not 
appear to be reduced when the large linear sessional change has been removed, 
Large values of might have implied erratic observation and small relation between 


TABLE IV. Constants of Individual Series (Trisection). 


(The definition of these constants is given on p. 35.) 











1 2 3 4 5) 6 7 8 

| Series dy b py Pi | oy’ Oo} | o§ | 
eee | 

I 2°6238 | +°000673 +°2925 | +°3008 ‘06015 | ‘06093 | °0721 

I] ‘7036 — 002964 +°4149 | +°5485 | (08125 | °09182° -0873 

F | 6350 | — 003626 +°3643 | +°5560 | ‘08001 | ‘09561  -0901 
IV 5114 | +°001718 | —°2521 — °0460 | ‘05356 | 05902 | *0854 | 
V 53809 | —°001555 +°2520 | +°3234 | 06853 | :07210 | 0839 | 
VI 5132 | — 001529 +°2270 | +°3390 | ‘05495 | :05921 | -0681 | 

VII 6448 | — 004244  +°4918 | +°6457 | 09322 , °11154 | :0939 

VIII 5314 | —°002788 | +°4478 | +°6089 | °11369 °12060 | -1067 
IX 3404 | -— ‘004477 | +°4979 | +°7075 | ‘07935 | °10233 | :0783 | 

Xx 5590 | — 000036 | +°7151 | 4+°7151 | °11141 ‘11141 | :0841 

XI 4582 | — -000972 | +°7320 | +°7381 | ‘08317 | -08435 | -0610 
XII “5014 | — 002720 | +°4851 | +°6360 , 05923 ‘07105 | ‘0606 | 

| XIII ‘4752 | — 003594 | +°5101 | +-°6897 | ‘06409 -08244 ‘0649 
XIV ‘5000 | — 003818 | +°6433 | +°7965 | 05993 ‘08141 ‘0519 | 
XV 4290 | — 005588 +°6810 | +°8568 | ‘07051 | *10711 ~~ 0573 | 
XVI ‘4390 | — 003071 | +°6408 | +°7412 | 06441 ‘07819 | 0562 | 

XVII *4254 | — 004369 | +°2569 | +°6556 | 05840 | ‘08594 | -0713 
XVIII *3944 | - ‘000580 | +°2870 | +°3144 | (04568 | 04644 | -0544 | 
XIX *3700 | — 003236 | +°4935 | +°7219 , 05107 06920 | ‘0516 | 
XX 2°3666 — 001725 | +°2850 +°5072 | (03680 | 04443 | 0441 | 

Mean value of b= — 002425. 


* The theory is based on the hypothesis that the variates follow a normal distribution, and though 
this may not be strictly true for the p,’s and o;’s the method probably gives a sufficiently accurate 
approximation to the value of their correlation. 








—- — 


~~ 


Se 


' 
{ 


ee 








Egon S. PEARSON 53 


successive judgments, and at the same time high correlation might have been 
expected to result in small variation. The significance of this will be discussed in 
the concluding sections of this paper. 

In Table IIT giving the o’s, it will be seen that in general the standard deviations 
increase in the later groups; though this may be due in part to the parabolic form 
of the sessional change, with its tendency to an increasing drop towards the end of 
the series, it is possible that it also indicates a fatigue effect setting in, and causing 
the later observations in a series to be more erratic ; the same phenomenon appears 
in the Bisection Experiment where there is no appreciable sessional change within 
the series. It may in fact be looked on as a sessional change in the standard 
deviation. 

At the end of Table I are given the dates on which the different series were 
carried out, remarks noted at the time as to the condition of the observer, and, for 
the last 14 series, the time taken to mark off the 70 forms*. It will be seen that 
there was a large gap between the times of carrying out the first six and the last 
fourteen series, and this interval of nearly two months broke the continuity of the 
secular change in the means of the series. In Figure 5 the means of Group 1 (or 


CesT risection . 








Personal Equation ‘. 
—Time (days from 6th May) ‘ 


Personal Equation 


—Order of Series 


NWS WOOO 













* JZ 
tre ' °F 20 we 5.0 2. 0.0.9.0 8-8 6 8.0 2 oe 0 eee 0 eee. 68 8 

5, 9.,13. 21 25 29 33 37 41 45 49 53.57 61 65 73,77 81 
» ie u i549 23 ¥ 31 35 39 43 47 51 56~59 63 67\71 75 79 83 


6 


@ 


-Ol1F 


SSS288 
T ve 2m 
| Equation nn - 


3S 
Persona 
in inches 


TT 














Fig. 5. 
the d,’s of each series) have been plotted firstly with the order of series and 
secondly with the date of series. 
If « is the personal equation, or mean value of the observations of Group 1 of a 
series measured in inches, 
y the number or order of a Series, 
z the number of days between the 6th May and the date on which the series 
was carried out, 
* Reference to the 7 trial forms first marked, in addition to the 63 of the Series proper, is made on 


p. 28 in footnote. 








54 On the Variations in Personal Equation 


we have for the regression straight line of # on y 
a — 2°4976 = — 01351 (y —10°5)....... ec cece eeeeeeee (xxxii), 
and for the regression of # on z 
aw — 2:4976 = — 00233 (2 — 53°05) 


and these lines have been drawn in the diagram. 


eae Snuse verted See 


The corresponding coefficients of correlation between (1) personal equation and 
order, (2) personal equation and time, and (3) time and order, a meaningless 
coefficient but required to find the partial correlations, are 

(1) Try = — 800 + 054, 
(2) tp. = — 692 + 079, 
(3) yz = + °882 + 033, 
and the partial correlations are 
Vey.z2 =~ "D59 + "104, 
Vee. y = +049 +150. 
But the interval between the May and July series was so large, that the series 
should perhaps be considered as forming two groups, one of six and the other of 
fourteen. Taking the last fourteen series, we have the regression lines 
wv — 24596 = — 01346 (y — 7°5), 
the Series VII being given the order 1, VIII, 2 ete., and 
aw — 2°4596 = — 01048 (z — 6°64), 
z being the days between 10th July and date of Series. These lines have also been 
drawn on the diagrams. 


The correlations are 


(1) ryy = — 674 + 098, 
(2) rez = — 673 + 099, 
(3) rye = + 956 + 016, 


giving partial correlations Vey .2 = — 143 +177, 
Pee.y = — 138 $177. 


The point of interest is this: there is a secular change in personal equation 
from series to series; is this change more closely related to the number of series 
or sessions that have gone before (that is, almost, to the experience gained), or is 
it due to some change with time in the observer's outlook ? Suppose that it was 
arranged to carry out observations on a number of different days with varying 
intervals of time between them, and that on each day a certain number of different 
series of observations or sessions were undertaken at regular intervals of perhaps 
an hour or less; any series could then be classified as the pth series of the qth 
day. Then r,,., (the partial correlation of personal equation and order, time being 
kept constant) would give a measure of the relationship between change in personal 
equation and order of series in any one day. This will not necessarily be the same 








Se od 





Te Qe 








Eaon S. PEARSON 55 


as the sessional change, for it has been supposed that this latter occurs only during 
the course of a sitting, and is broken by the interval of rest in between. On the 
other hand if we take all the pth series of the various days, then Txz.y (the partial 
correlation of personal equation and time, order being constant) gives the relation 
between change in personal equation and time, taken over a long period. 

The long break in the middle of the Trisection Experiment takes away any 
real significance from the difference between r,,.,(—°559) and rzz,y (+ 049) for the 
twenty series, and in the case of the last fourteen series these coefficients are equal 
(—"143 and —-138), because the intervals between the series were nearly uniform. 
In the Timing Experiments, C and D, the arrangement of the series in groups on 
consecutive days leads to considerably more interesting results *. 

A comparative ineasure of the consistency of the consecutive judgments in the 
different series, is the standard deviation of first differences, or 


> (Yru — yy 
t=1 





=o,V2(1 — pr) 


C5 = 
n 


approximately. The values of this expression are given in the 8th column of 
Table IV. 

Now suppose we compare the constants in Table LV, the dates and remarks at 
the end of Table I and the diagrammatic representation of seven of the series given 
in Figure 6. The first series to be remarked on is IV; most of the series were 
carried out at the beginning of the morning before any other work, and it is possibly 
the fact that IV was done soon after a spell of measuring spectrograms with a Zeiss 


comparator that explains the exceptional values of p, and p,’, namely p, =— 0460, 
pi, =—°2521. The os, or standard deviation of Ist differences, is no higher than for 


the other series done at about the same time (in May), and the o, is lower. 
The first graph in Figure 6 gives the diagram of this series ; the rapid fluctuations 
in judgment about a very steady, if slowly changing, personal equation may perhaps 
have some physiological significance. In the second and third graphs of Figure 6 
are represented two of the four Series VII—X which were done when the 
observer was not very fit; they have large values for o,, and the o's are large 
compared with those of the ten series which follow, showing that the judgments 
were rather erratic; the correlation is however high. In VIII there is a great 
jump between the 44th and 45th judgments, from 2°22 to 2°66, and the gradual 
drop down, which follows, to 2°20 (for the 52nd judgment) is a good example of 
a way in which successive judgments are correlated. In Series XI (not repre- 
sented among the graphs) there appears to be a periodic variation, for the 
correlation falls steadily from p,=+°7381 to py = —*4428. 

XIII, XV and XVI are typical highly correlated series with large sessional 
changes; the a ’s as well as the o,’s are considerably smaller than in the series 
VII—X. In examining the fourth to sixth graphs we notice what may be called 
the large scale correlated variations superimposed upon the linear sessional change ; 


* See pp. 70, 75 and 83, and below. 











56 On the Variations in Personal Equation 


it is due to these variations that the values of pi remain so large, and it is their 
absence that makes the correlation in IV so low. The last graph (Series XX) is 
that for which the o, and a; are least, and yet pi is quite high (+ °507). 
As a last instance of these points, we may compare the constants* for XV and 
XVIII: 
b pi Pi G7} os 
XV — "005588 +°6810 +°8568 ‘10711 ‘057: 
XVIII —-000580 +°2870 +4°3144 04644 -0544 





SERIES IV f 









\| 
C9 \| 
v e b 

Oo; 


36+ Order of observation in Series g- «gs 
an 1 20_ 6 








i T 
SERIES VIII A , 
| 
/ 


K| 


3 
Qa 
ery wee yt 





oan 
BOO 
 - 
t~e 
|e 
[6 
é 
ae 
_ 
—e 
|e 


‘44+-\ / d P= +-B09 i pe 
‘42- \ ¢ 00879 I \ r f 
“40+ \ Pie | f 

‘38F © o= ‘121 | 

G;= “107 


Estimate in inches 
ow 
aoQ 
- 
— 
——— I 
— 
ieee 
ee eal 
t.) 


2207) 10 


20 30 40 50 . 6o 
T T 
262 ene, 
60 SERIES IX 














T T T T T 
The horizontal line intersecting each graph gives the mean of the first 50 observations in that 


series, 
Fig. 6. Trisection. Diagrams representing variations in judgment. 


* For definitions of these constants the table on page 35 may be referred to, 








EE 








—<—— 





Estimate in inches. 





Egon S. PEARSON 





2°28} 





SERIES XIII 








b =- 00359 oe A 


P= +690 
g,= “082 
o= “065 
Order of observation in Series. 
1 





BL . 
° % SERIES XV 


T 





p 
o 
Srrrre rT TT 


is) 
nN 
T 





= 4.57 bd 
p +e) 4 e \ baal 
b=--00559 ov \ v ® 
a= “107 
G3= 057 ‘ 


1 10 20 30 40 50 Mer ®. 





wo 
b 
ai Be 


T T 
60Le Nese SERIES XVI 





b =- 00307 
a= ‘078 | 
03> “056 v 


p= +741 \ f Vw 





1 10 20 30 40 50 od 








T 
e SERIES XX 





o,= 044 \ \ v4 ae Se wry }- 





1% V44 20 30 40 50 60° 


The horizontal line intersecting each graph gives the mean of the first 50 observations in that series. 


Fig. 6. Trisection. Diagrams representing variations in judgment (continued). 


XV has a large linear sessional change, but superimposed on this there must be 


considerable correlated variations, for the removal of the best fitting straight line 
only reduces p, (8568) to p, (6810). XVIIT has variations altogether on a smaller 
scale ; the correlation of successive judgments is low and it is barely affected by the 
removal of the linear sessional change. And yet though the o, for XV is more 
than twice as great as for XVIII, the o,’s, or measures of the average jump in 
estimation from judgment to judgment, are practically identical ; the importance 
of this constant o; as a measure of variability of judgment is discussed on p. 69 


below. 





~ 
= 
a) 
=> 
~~ 
i 
> 
J 
Ss 
= 
-> 
S 
H 
AY 
~ 
> 
‘= 
A) 
~ 
- 
2 
~ 
~ 
-S 
Ss 
we 
Sy 
S 
~ 
& 
~ 





| | 

LIGI- —| OLLI- —| COLT. - 9Z01- — 
L1ZO- + 
PLOI- — 


80Z0- + 
cé6cl. — 


80Z0. + 


EScl. — 


LOT. —| C96I- vi él6I- — 


£020: F| COU. ¥ | 9020. F 
6 8 


6890. — LEFO. —| SIZO- +] EOL0. + 


Z1ZO. + E120. + 
6190: — ¢cFO. — 


L 9 i 7 € 





‘abunyy pouowsag wnauvT 





TA WTAVia 








9F LO. + 


SLFO: 


“O0E0- 


0090. + 


€6€0. + 


GLCO0- 


[¢co.- 


CFIO. + 


¢900. + 


F9EO- + 


Cgzo. + 


CZCO.- — 


68F0- 


LLFO. + 


OLIO. + 


e901. LIé@1- FIGI. srl. 


OGEI- 


9020. 
9E8I- 


80Z0- + 
SO9L- 


80Z0- 
LFO9L- 


LO@O. ¥ 
61LI. 


L0ZO- 
Iggl- 
9100. + 
OLO60- 


62 LOO- 
9LESO- 


FE LOO. 
80680. 


ZELOO- 
GELSO- 


LETOO- 
GLOGO. 


LE TO0- 
£9060. 


8ZL0- 
SEsg- 


LGLO0- 
9FE9- 


LELO. + 
Clz9- 


O€TLO- 


GECI- 


€ET1O. 
LGT9. 


66100. 
CoLEI- 


OOZOO- 
OLZET- 


LOZOO- + 
FOkel- 


LOZOO0- 
SESE L- 


L0Z00- 
LPESl- 


T0G00- 
OL€él- 


8SEEo. — FI6CO. — SI9ZO.— OLEZO. [C6 L0- 


OG-@ 


CFLEO- 


| 6ZTO0- 


99s T- 00ZE. OGES- LOSE: 


GOGO. + 
FEE. 


9020. + 
OI6L- 


9610. + 
ELS 


€0ZO- 
S1éZ- 
SZ100- + 
ZOGRO- 


6ZLO0- 
CECRo- 


SZ LOO- 
CFE80- €ECs0- 
ZLLO- + 
FO69- 


SLLO. + 
F699- 


6L10- 
GFOO- 


€Z10- 
S6F9- 
!61L00- + 
ZPOEL- 


86100. + 
COLE l- 


S6LO0- 
9FLEL- 


66100. + 
COLEL- 


| 6910. —| G9PLO- —  CFZLO- —| 9EOTO.- 


ecg. + 


| 
€1Z0. #| Z1ZO. F  -LOZO. F 
60Z0- +| 6690. + FEO. + 


wolf paatf{ suoymasasg” anissadong fo worynjatoy fo 


) suoyenbry 9v0u0 

| -dayty pug wou , 47 
Z9LO- +} | suorenby eoud 
G68F- +) MHI, ST wou "ay 


L810. + 
FOCE. + 





1=4¥ 





squawyjao,) 


CLOO.- I#90-—-; L86I.+ 


CCrO. 9180-—,; GChO-+| 1r9E-— 


99TE- LOB8E. 9ZZG. 9FZ9. + 
I8LO-. + 
SL8E- 


O& TO- 
9PEY- 


ecto. 
L€Ze.- 


c6LO- 
CClLeé- 


L@LO0- 
OCFS8O- 


8ZL00- #' SZ100- 
66F80- SLFSO- 


62 L00- 
FECRO- 


600. 
9F6L- 


8600. €900.- 


19€ZL- 


LOTO- 
6COL- 


C6100- 
9ZE6EI- 


F6 LOO0- 
Z68EI- 


S61L00- 
[96ZI- 


96 L00- 
€00€ L- 
€9FOO0- L€ZO0- — 


OG-G 


CPFs8o0- CF900- 





(U01j9a81.L J) Saiuagy paurquioy fo szunjsuog 


‘A ATAVL 











Econ S. PEARSON 


(b) The Combination of Series. 


Having discussed the reduction of the individual series, I will proceed to con- 
sider the results of combining the 20 series. The formulae (v) to (viii) on pp. 33 and 
34 give the values of D,, S;, and R, which are tabled below. Remembering that 
D, and S, are the mean and standard deviation of the combined observations 
Yi» Yo-+» Yo from each of the 20 series, D, and S, the mean and standard deviation 
of the combined observations y2, ys... Ys, and finally D,, and S,, of yy, is +++ Yoos WE 
see that the progressive decrease in D; as k increases indicates the shortening in 
the estimate of a third during the course of a sitting, while the increase in S; may 
perhaps be partly due to increasing variability of judgment due to fatigue. The 
values of R,; are large, but this is to be expected owing to the large changes in 
personal equation from series to series; in fact for ’ = 13 it will be found that the 
limiting expression Ly of page 34 gives 


D3 = + °5435, while R, = °6151. 


The reason for this difference between L,; and R,, is that ¥(p,,0,0,,), and therefore 
m 


R,;, does not vanish. The next step is to obtain the values of S;’ and R,’, or the 
standard deviations and correlations of successive judgments after the secular 
change has been removed. They are found from Equations (ix) and (x) of p. 34 
and are given in Table V (5th and 6th rows). 


There is here an opportunity of testing the accuracy of the Difference Correla- 
tion method discussed in Section V(b); the case is that of Problem 1, page 41, the 
values of R,, R,... RR, are known and give the correlation of Ist differences, 
iR,, ,R,...,Riy; these together with the coefficients of correlation of 2nd differences 
to be used later, are given at the bottom of Table V. Then using the value + 6246 
for R,’, we get the values of the 12 quantities R.’... R,,’, which have been inserted 
in the 7th row of Table V. It will be seen that the values obtained by this 
approximate method agree well with the others, the differences being within the 
probable error of the R,’’s up to and including R,’; beyond this the approximate 
values become rapidly too small, the error, from the form of Equations (xx) and 
(xxi), being clearly cumulative. This failure is certainly largely due to the fact that 
the errors involved in the assumptions («), (b) and (c) of p. 37 are not negligible 
when the later groups enter into the correlation, for we have already seen that 
both D, and S; change steadily with k. 


The values of S;’ and R,’ in Table V correspond to the average values of the 
standard deviations and correlations of successive judgments in the individual 
series, ie. of the o’s and p’s given in Tables III and I. Owing to the sessional 
change which occurs during the course of nearly all the series, R;’ does not vanish 
as k increases, but appears to approach a limiting value in the neighbourhood of 
+16. By obtaining for the separate series, the coefficient of correlation, p,’, of the 
successive observations at intervals of one, freed from the linear sessional change, 
a step has been made towards the further reduction of the problem. R,”, the 





60 On the Variations in Personal Equation 


coefficient for the combined series corresponding to p,’ of the individual series, is 
given by 

= (p:'oy'o2') 

m 


R,’= 
J eer, Ee} 
) (im ) 


( m 





saubrael stnvencovines EE 


and taking p,’, 0,’ and oa,’ calculated from Equations (xxii), (xxiv) and (xxv) we 
find that 
R,” = + 48922 + 01623. 

Then R,”, R,’... R,;” can be found by the method of Problem 2, p. 41; or 
again using the value of R,” found from the first difference equations, we may 
proceed to second differences as in Problem 3, and so obtain R,’’...R,.”. In this 
particular case there is no need to use the second difference equations*, but the 
values of the R,”’s have been worked out by both methods, as numerical examples 
of the theory of Sections V (a) and (b). A comparison of the values given in 
Table VI shows that there is no significant difference between the results of the 
two methods+, and the agreement found earlier in this section between the 
values of R,’ calculated directly and those found from the difference equations, 
warrants confidence in the results for R,”’. Although the negative values of R,” 
are probably too large for the higher values of & (just as the later positive values 
of R, in Table V, row 7,, were too small), there is no doubt, I think, that the 
correlation of the successive observations freed from the linear sessional change, 
does become negative at k=5,6 or 7 and remain negative for the higher values 
of k. A word of qualification is necessary; the linear sessional change to be 
removed has been represented by the line “ best” fitting the first 50 observations 
of each series, and a glance at Figure 4 shows that the mean values of the later 
observations in the series of 63 would lie well off this line because of the parabolic 
form of the sessional change; the negative values found for R,’, R,”, etc. may 
probably be largely accounted for by this fact. A more satisfactory approximation 
to the correlation of successive judgments freed from sessional change will be 
obtained in Section XI below. 


/ . mee e e e 
As o3=0,V2(1—p,), referred to on p. 55 above, gives the standard deviation 
of the first differences of consecutive judgments in a single series, we shall have as 
a corresponding measure for the combined twenty series 


Ss = S,’ V2 (l— im, 
For the Trisection Experiment 
S; = 0732. 
* To get an idea of the order of the terms Q; and b* which are being neglected, the values were 
calculated for two values of k, with the result 
k=1, Q,+?= ~ -000001064) 
k=9, Q.+b%= +-000000192) * 
: a . : = Re 
+ The probable errors in the Table have been calculated from the usual formula e= + °6745 — = 


VN 
and do not cover the errors arising from the method of approximation. 


























Eaon S. PEARSON 61 


(c) On the possible Effect of shifting the Head during the course of a Series. 

It was suggested to me that the correlation of successive judgments in this 
and in the Bisection Experiment might be due to periodic shifting of the head 
from side to side during the course of a series, some parallax effect of the two eyes 
making corresponding variations in the estimation of a third (or a half) of the 
line on the form. Now such an explanation might account for part of the corre- 
lation in these two experiments, but it could not explain the regular secular and 
sessional changes in the Trisection, except by the highly improbable hypothesis 
that the observer’s head leant over increasingly to one side during the course of a 
sitting, and that he started with it more on one side in the later series than 
in the earlier ones. But beyond this, the fact that correlation is found also in the 
timing experiments suggests that it is of deeper and more complex origin. It is 
likely to arise from many unknown causes affecting the environment and condition 
of the observer, and if one of them is a relative shifting of the eyes, it is of interest, 
for it will enter into many kinds of observations, where the observer who takes 
the readings is not looking through a fixed eyepiece. 

To test the effect of a relative shift between head and paper, 42 of the forms 
were taken, and trisected in the usual way, but for alternate groups of seven the 
paper was shifted 4 inches relatively from side to side. The measures of the 
estimates and their means are in Table VII. The three sets of seven under the 
heading I, were made with the forms in one position, the three sets under II 
with the forms shifted 4 inches to the right. The difference is noticeable at once; 
readings I are smaller than II, and at the same time the curious effect of sessional 
change is appearing, the later readings of I and again of II, being on the whole 
smaller than the earlier ones. Now in carrying out the observations of the 
Trisection and Bisection Series, the body and head were kept as steady as possible, 
and it is unlikely that frequent shifts as large as 4 inches could have occurred ; 
further the differences between the means of readings I and II, are much smaller 
than the actual variations in judgment shown in the diagrams of Figure 6. 

But as a further test, a series of 63 forms were marked off, with the head 
fixed mechanically; the results are given in Table VIII with the usual notation. 
The correlations are not as high as many of those in Table I, but they are com- 
parable with those of Series I, V, VI, XVIII. The sessional change is also indicated 
by the decrease in d; as k increases*. Without carrying through a good number of 
series with fixed head, no useful comparison can be made, but I think that the 
evidence of this one series is sufficient to justify the assertion that a shifting of 
the head from side to side cannot account for the greater part of the correlation 
of successive judgments. 

(d) Summary. 

First considering the individual series, it was noticed that there was a secular 
ie. the means decreased in passing from 





change in Personal Equation with time 


* The value of o, 
ment A, which was -0732. 


or -074 may be compared with that of S, for the ordinary series of the Experi- 
e c 











7~ 
~ 
2 
~ 
pes 
Ss 
> 
s 
~S> 
8 
= 
- 
S 
D 
oy 
~ 
= 
= 
% 
=~ 
~ 
A 
~~ 
8 
‘= 
~ 
is} 
> 
ie) 
<= 
~ 
~ 
Ss 


‘PLO. = ('d — 1) aN lo=%o 





Z0990- 
800¢-Z 


6190 + 
2990. 
ZZOG-Z 


L601. + | 
FO690- 
980G-E 


O€FO. + 
8L990- 
F90G-% 


L821. + 
86990- 
ZQOG-Z 


[é¢1- + 
88990.  8¢C90. 
OFOS. 


1010. + 
[LOLO- 
960¢-@ 


61L0- + 
8F690- 
DELE-S 


1é1é. + 
¢¢690- 
FHES-S 


86Z0. + 
9FC90- 
8FOG.Z OCOG-Z 


L100. — 


9C8e. 
FEOLO- 
O9T¢- 


a 


FSOF- + 


C9690. 


c 


S81C-Z 


SEF. +| 
FOOLO- 
FE1GS | 











TTA WIAVL 





aa ce 
IF rE 
OF €€ 
6E ze 
ge | 1g 
1g 3 | 08 
9€ 5 «63 


OLP-% 
uray 


8% l@ 
LZ 0@ 
9G . 61 
ce ; . SI 
Fe LI 
2c , 9T 
GE if. cl 





SUOT}BA 





Iapio 





| SUOT}BA SUOT}BA 
-198Sd0 
jo 


1apio 








| SHOTZBA 





-198q0 
jo 


IaplgQ 





UvaTT 


68-% 


OtOr 


10D =H u 


i 


SUOTIBA | 
-1a8qQ | 

jo | 
Iapig | 





‘ponary fo wfryg uo squamuadasy 
TIA WIAVL 

















Egon S. PEARSON > 63 


the earlier to the later series; in addition there was a remarkably constant 
sessional change within each series, this change being again a decrease from the 
earlier to the later observations. There was something in these changes almost 
analogous to an elastic strain, during the course of a series the estimation of a 
third drops, in the interval between the series there is a recovery, but not a 
complete recovery, for the first judgments in the succeeding series start at a 
little lower level than the first, but well above the last judgments in the series 
before ; this slight “ permanent deformation” caused by the “strain” represented 
in the sessional change, results in the secular fall. The figure below gives an 
ideal representation of this. 





Fig. 7. 


A, By, Az Bo, ... sessional change in Series 1, 2 ... ete. 
B, dy, BeAz, ... “‘vecovery” during interval between Series 1 and 2, 2 and 3 ete. 
M, Mg the resulting secular change. 


Then combining the twenty series, in order to get more reliable results, the 
coefficients of correlation of successive judgments, Ry, were obtained ; owing to 
the secular and sessional changes these coefficients had very high values and as & 
increased, apparently tended to a limit at about +°60. By fitting the means of 
the series together, the secular change was eliminated, and a series of coefficients 
R, obtained, which represented the average value of the correlation in a series ; 
owing to the sessional change the R,’’s did not appear to tend to zero as k 
increased but to a limit at +°16 or +°15. The correlation of successive values of 
the residuals, left after subtracting the ordinates of the straight line “best” fitting 
the first 50 observations of each series from the observations of that series, gave 
a set of coefficients R,”, which fell off very rapidly and became negative when & 
equalled 6 or 7; the large negative values of the coefficients for the high values of 
k were probably due in part to the method of approximation used, and also to the 
fact that the straight line fitting the first 50 observations in a series did not 
represent satisfactorily the sessional change. 


The values of R,’ calculated (up to / = 13), gave no evidence of any tendency 
to periodicity in this coefficient, although there was evidence of this occurring in 
some of the individual series; periodicity in R,’ would indicate marked variations 
of roughly the same period occurring at any rate in a large number of the series. 








64 On the Variations in Personal Equation 
It will be shown in a later section that the values of R,’, k=1... 13, can be fitted 
very closely by a curve of the type y= p+ qr*, where p, q, and r are constants. 


Finally it was shown that the correlation of successive judgments could not be 
due to a shifting of the head during the course of observation, although this 
might perhaps be one of many contributory causes. 


EXPERIMENT A. TRISECTION. CorrReLATION—INTERVAL DIAGRAM. 





CORRELATION OF SUCCESSIVE JUDGMENTS 























L 7 I I I I I I I I diten I I 
oO 1 2 3 4 5 6 7 & 9 10 a 12 3 
INTERVAL BETWEEN SUCCESSIVE JUDGMENTS 
Fig. 8. 


In Figure 8 the values of R,, R,, and R,” (for linear sessional change) are 
plotted to &; the theoretical curves of the Equations (xlix) and (lvi) shown in the 
Figure will be discussed in Section XI, and also the points referred to as R,””. 








a 





he 





VII. Experiment B, BIsEcTION. 


(a) The individual Series. 

In this Experiment the coefficients of correlation of successive judgments for 
the individual series were not all worked out, but only the values of p,; these are 
tabled with o, and d, in Table IX. The values of d; for each series, k=1... 14 
are also given in Table X. It will be seen that there are not the same marked 
secular or sessional changes as characterised the Trisection Series. In Figure 9 
the means of Groups 1 of each Series—or the d,’s—have been plotted to “ order” 
and to “time,” and again if 


Ea@on S. PEARSON 


x is the personal equation or mean, 


y the number of order of series, 


REDUCTION OF OBSERVATIONS. 


z the number of days between 13th June and the date on which the series 
was carried out, 


we have for the regression lines 
a — 28793 = — 0010131 (y—105) ............... (Xxxiv), 


x — 2'8793 = — 0005359 (z — 32°10) 


These lines have been drawn in the diagrams; the coefficients of correlation 


are 


Tez = — ‘337 +°154, 


giving partial correlations 


Tay.z2=+ 529 +°109, 


xy = — 156 +147, 


ye = +945 + “016, 


TABLE IX. Constants of Bisection Experiments. 





Series 


I 


| 


d; 





2°8648 
"8624 
"9262 
8642 
*8290 
“9114 
*9178 
9218 
8724 





"9238 
"9298 
*8806 


*8990 | 


*8312 | 


‘7976 | 
"8566 | 


*8808 
“8890 


| 2*9030 | 


| 
| 
| 
| 
| *8242 | 
| 


“03821 
04690 
05158 
“04609 
04415 
‘04766 
*04384 
“04579 


03617 | 





*04810 | 


*04407 


"04955 | 
03606 | 
04135 | 


03739 
‘03497 
“03986 
02610 


+ 


FEEEEHEHE HEHE ttttettsetet 


PL og =o, 2 (1 — p;) 


"4942 
*2609 
“0823 
‘4107 
*5870 
"5768 
*2993 
‘4360 
"1389 


1018 | 


0423 


5089 | 


2769 


4445 | 


3334 
“3190 
“5531 
‘2776 
"5407 
“1404 


“0503 
‘0664 
0518 
“0509 
“0469 
0424 
"0523 
‘0506 
‘0575 
‘0614 
"0522 
‘0477 
“0530 
"0522 
‘0416 
“0483 
"O35: 

‘0420 
‘0382 
“0342 








Mean time taken for a series of 70 observations (including the 7 preliminary trials 
Mean interval between records of judgment 


Biometrika xiv 


| Dates (1920) and time 
at start 


11 a.m.) ,. 
2.45 ce " 
| pm. 15 
| 10 a.m. 26 
| p.m} 
am. 30 
| 10 a.m.) 1 
12.15 p.m.§ 
poco! 9 
6.30 p.m.§ ~ 
9.30 a.m.) 
6.45 p.m. 
am. 7 
11 a.m.) l 
2.30 p.m.§ 
p.m 16 
am. 17 
p-m. 18 
p-m. 19 
p-m. 20 





5°118 


June 


August 


” 


” 


* See p. 28, footnote. 


6§™ 


vo 


—.——$"“DRAIAAD 
Not 
recorded 


Y22.y = — 589 + 099. 


Time taken | 
for series | Probable 


Os 
20 
45 
30 
45 


Errors 











Pp | P,E 
80 | +°0343 
7 + 0486 
60 | +:0610 
‘50 | +:0715 
‘40 | +-:0801 
30 | +°0868 
20 | +:0916 
10 | +0944 
00 + 0945 
*) 5m 58s 





of 

Coefficients of Corre- 
| lation calculated from 
| 50 pairs of the vari- 
| ates. 








8Z088-2 | 9L6L8-E 8Z6L8-~ | SUBOTT 


| 
j | 
| 9€188-E | €9088-2 | LTO88-~ 


186882 | 19e88-¢ | OVE88-% GIESez | EFzes-z | cezes-z | O6ISE.c | 9GI98-2 


— + | 
8906-2 8t06- GE06- PEO6- 0S06- 9€06- | O806- 8c06- | 9206: FLO6- - | S106: PEO6- 0€06-6 | XX 
F8Le- 98L8- 8828- 9828. SLLE- SLL8. SLLE- 9618. | GO8s- 9188. | | $988. 988k: O688- 
P998- F998. | FL98- 9898: 9L98- | G898- P698- PIL8- | OGLE. OFL8- | #8L8- | OGLE: 8088- 
O€C8- 8éc8- | DPS. 9€¢8. O06G8. 81g. PIs. rics. | 81s. PIS. | GEG8- | SPs. 99¢8- 
F608- 9808: 8908- 0908- PPOS: O0ZO8- 9008. 966L- 9L6L- 996L- | FL6L- | SL6Z- 9L6L- 
808: 98e8- | P9E8- GPES- PER. 90€8. FOES: F6E8- 98Zs- 99¢8- | O9@8- | FZ. CPEs: 
8LPs- 8crs: 92F8- | OIF8- | 90FS. 90F8- POrS.- P6E8- | G6ES- GIES. IES. OFES- GEES. G1E8- 
F968: 0°68: 9T68- GO68- | 9L88- | 9L8B. OF88- 818s. 8088- 9628: | 9188 FO8s- 9088: 
PIP6: FOPG: C6E6- 86E6- OOF6- | GLPE6: 9GF6- 86E6- PLE6- » | S86. | OOE6- 86E6- 
86Z6- GOE6- | O6Z6- | Y98Z6- GSZ6- GSE. C9G6- 5 SLE6- 98Z6- ¢ | F9GE6- FEC6. SEE: 
G006- FG06- 8206. | 9106. | 8T06- OL06- PGO06- . FEe06- 806. | F006. | 9868. 0668. 
O98. 8298: 0998- | 9298. 9L98- 8898. 0898- . 0698- 9898. | POLE: 9EL8- PELE. 
Gel6- PcI6- GLI6- OLT6- 8sI6- 9026. 9026. ; 09Z6. F9Z6- | O9G6- | OEE. | OEZE- 81Z6- 
O86: F616. | SIG6- OLG6. P1G6. 9ECE6- O&Z6- 8Z6- 9616- 00Z6- 9616. | Z6I6- SLI6- 
90G6- | ZGIGb- STZ6- 606- 9616- SIZ6- OLG6- ‘ ELc6- 9616- GLI6- | FET6- 9€ 16- FLI6- 
9LES- O8€8: SLE8- 9LE8. G9E8- GES. 8PE8- y PrEs- Fess. | OES. | OES: FOES: 0628- 
0688- 9888- g988- | 9988 | FE88- | PFz88. 98L8- 748- | SILe8- FOL8- | F898- r898- | 9998- EP98- 
9GG6- PEC6- PEC. PPG6- | GPhE6. | YEGE- PEs. 26- O@Z6- | 8ZZ6- 8EC6- 9°26. | O9G6- 69G6- 
OLL8- | PLLE- GLLB- 8848. | G8LeE- PLL: 9¢Ls. . OIL8- | FOL8- | 698 | 9298. 9C98- FE98- 
8998-6 PLOB8.Z 8998-6 PE98-6 | 8FI8-% 098-4 8P98-G FP98-S ZC98-Z | FE98-Z | OF98-Z | 9C98.Z 9F98-E 8P98-% 
| | 








| 








| | 


| 
| 





a recess a 
| 


| 
| 


| 
FI dnory gt dnory| gt dnory | 11 dnoiy | oT dnorg 6 dnory | g dnory | 4 dnory | 9 dnoin | e dnory | F dno | 8 dnoiy | Zdnoiy = [ dnoip 











’ 


= 
a) 
S 
S$ 
~ 
< 
eS 
S 
S 
H 
S 
RY 
~4 
H 
~ 
> 
= 
~~» 
= 
S 
Ne 
L 
bad 
~ 
~ 
S 





‘(sayour ur) sdnoawy fo sunapy fo 2790, 


‘NOLLOGSIQ “X ATAVL 








ec 








Eaon S. PEARSON 67 





The dates on which the series were carried out—the z’s—are given at the end 
of Table IX; the distribution was more satisfactory than that of the Trisections, 
and the significance of these two partial correlations will be referred to shortly. 

The variation in the means of the series is much smaller than in the case of 
the Trisections ; we have here a range from 2°93 to 2°80 ins. while in the other, 
from 2°70 to 2°34 ins.; in both cases the secular change is in the direction which 
lessens the measures, i.e. the marks on the forms in the later series were on the 
whole further to the observer’s left hand than in the earlier series. Nor does 
experience appear to increase accuracy, for the true position of the half is at 2°97 
inches (and of the third at 2°51 inches). 




















$ 3 . o” ” 
= 1 e 
& 290 ‘ ° 
= ai € 
= #-2-8793 
ne 7 Jd= -e()) 10131. 
2 Bee 10131 (y105) 
3 § 
3 4 
ie *e , : 
Roa& 1 
> 80 
3 2°79 s 4] ® 
s 4 
2 1.6.67. 9.00 1719 BF ee a oF a ae ae ye a a oe 6e eas 
3s 2% 6’ 8 “10 121416 18 "20 ee 9 23 27 31 35 39 43 47 &1 55 59 63 67 71 
ol : > 
Personal Equation— Personal Equation (Mean of 1st 50 observations of 
Order of Series. series)—Time (days from 13th June). 


Fig. 9. Bisection. Means of Groups 1 of each series plotted with Order of Series and Date of Series. 
Next considering the sessional change, the values of % (defined on p. 47) have 
been plotted in Figure 10; the straight line “best” fitting these points is 
Yt — 2°8816 = + 0003534 (¢ — 32) ........... orintad (xli), 
where ¢ is the order of observation in a series, and the coefficient of correlation 
between % and ¢ is +5294 + ‘0137*. 
Using the relations of page 48, it is found that 
Ny, = '271 + 018, Vi— Py, = 963, 


“and on comparing this latter value with that for the Trisections (815) we see 


that in the present case the mean sessional change is of less significance. 














inches-—> 

291rS.= Diagram of Mean Sessional Change * @ 

290.8 fy A 
S 

28952, 

i a = er, ¥. 

oe if aS [2 \c eter @ | one 4 2: 

~~ “i iw niet ee Mean"at 2°8816 inches 

— ” Pi 4 —Regression 7,-28816- +0003534 (¢-32) 

2°85 . F The value of the true half is 2-97 inches 
1 10 20 30 40 50 60 


Fig. 10. Bisection. t, Order of Observation in Series. 


It will be noticed in looking at Figure 10 that the, points (¢, 7) appear to be 


subject to a fairly consistent periodic variation about the regression line, the 


* This correlation between the mean tth observation (y,) and t must be distinguished from the 
correlation between the ¢th observation (y,) and t, which is + °143, and as it should be, less than 7. 
5—2 











Estimate in inches, 








68 On the Variations in Personal Equation 


complete period covering from 20 to 22 observations. Without a detailed analysis 
of the separate series, it is not possible, to say whether there is a period of this 
order underlying the variations in judgment in all series, or whether this 
periodicity in ¥ results from large variations in one or two series; the diagrams 
of seven of the series, in Figure 11 do not certainly suggest any marked periodic 
variation, and it is possible that the drop at about the 44th and the peak near the 








‘6L 6s : P= +494 f 

3 - SERIES | a= O50 “ fn a /\ 

92} e a= “00 PX Ri Nae ON /\ | f 
290+» e i IAP x | s /\ 

eee \ aw \ UN EV NY . {\ a 
‘e4t © \ | oy, OPA Nye / V4 * V N V 
‘82+ . a Aaa 

2°80 ‘ Ne Ne 

78} Order of observation\ * * w/ 

‘7651 in Series 10 7 20 30 40 ee 60 





x = +e5n7 T ' ' U qT 
‘o4r SERIES V [12 * "38% 
=. 


3 e 
2°90 rer, G- V7, J 
BON VIM r “VAK 
\ : 








‘80/ | 

‘78+ ° 

‘76 

‘7471 410 20 30 40 50 60 
~ T i T 

o2+' SERIESVI =P, = +577 f 








a \ j / / \ \/ eo 
86 a ve ‘ * \ ji e 
‘8474 10 20 30 40 20 
soo} SERIESX  'A=*102 y i ; ' ' 

| 





‘gt a= 046 








‘04+ SERIES XII p= +-509 : , ’ : 


300+ o,= “O45 5, ¢ | 
‘98 a= “048 RY 


o 
iS) 
ia 


10 20 
+ SERIESXVI p,=+-319 , 
Or = 
eel g= O41 } \ nx /\ 

' a= “048 f | | [whe 








74 \ /% M v4 
—t} 20\,4 20 40 50 ies 
‘ost SeRIESXX [A=*14000 T° 9 
‘96+ G=° 





= ' o 20.= O34 re 2 a ’ 2 o » \ e, | 
q _ @ O \ } , 
2'90|-°—~}-<--w®-+ eo”. oS a PJ——<—$\—yO A+ t s | oe ey _ 
meee NY a sy a = NS x 9 


L \ ~e 
86 . ae 28 


The horizontal line intersecting each graph gives the mean of the first 50 observations in that series. 





Fig. 11. Bisections. Diagrams represen.ing variations in judgment. 











Eeon S. PEARSON 69 


55th observations in Series V and VI, would go far to account for the similar 
features in the 7 diagram, the “y” scale of which is four times greater than that 
in Figure 11. 


Using the method of Correlation of Ranks*, the correlation between o, and p, 
has been calculated for the 20 Series; the result is 


Yo, y= + °420 + 124. 
Another coefficient which may be calculated, is that of the correlation between 


o3, or the standard deviation of first differences of consecutive judgments, and p,; 
using the same method as for 7,,,,, it is found that 


Voge, = — “416 + 1125 and again reso, = + 465 + 118. 


Now p,, o, and o; are not three independent quantities, as they are connected 

by the relation 
aim a/ & On = a VET=p 
t=1 n 

and it is open to question, which two are the most fundamental. In the ordinary 
theory of the Combination of Observations, where it is assumed that p, is zero, 
it is natural to consider oc, (or o) as a fundamental constant, the measure of the 
accuracy of judgment; o3 appears to have no special significance and merely 
equals V2c. If however there is a correlation of successive judgments, o loses its 
importance ; if we take a small number, p, of successive observations and calculate 
their standard deviation, s,, we can no longer say that s,, subject to its probable 


p> 
8 é be : 
error + °6745 = of will be equal to o, the standard deviation of a long series of 
ap 


judgments. On the other hand there is every reason to expect that the o; found 
from a few observations will give a fair approximation to the o3; found from a 
large number. o is dependent to a high degree on the sessional change; for 
example it has been shown + that if this change can be represented by a straight 
line of the form y = bt, then o’, or the standard deviation of the observations freed 
from this change is given by 


/— b n? 
Co = 0° — 75 (n?— 1). 


It is true that os is dependent to some extent on the sessional change, but far 
less so; for instance in the case of the linear sessional change, o,’, the standard 
deviation of the first differences of the successive residuals left after the removal 
of the line, is given approximately by the relation 

o;?7= os - b*. 
And for any form of sessional change which is likely to occur in experiments 
of the type we are considering, the correction to the difference between two 
successive observations necessary to get the corresponding difference between the 


* p. 52 and footnote. 
+ Section V (d) p. 43. 








oo 


70 On the Variations in Personal Equation 


residuals after the removal of the sessional term, will be very small indeed compared 
with the standard deviation of this difference, or o3. It is therefore suggested 
that in the combination of correlated observations, 73, the average value of the 
jump in estimation between two successive judgments, is of more fundamental 
importance than o. As an example, consider the diagrams of the observations of 
Series X and Series XX in Figure 11; the correlation, p,, is very low in both 
cases, but it is suggested that the physiological significance of the difference 
in type between the two, lies in the fact that os for Series X is nearly twice as 
large as os for Series XX, rather than in the difference in the o,’s. Or again in 
the diagrams of the Trisection Experiment, Figure 6, I would emphasise the 
same point in a comparison of the difference between the two highly corre- 
lated Series VIII and XVI. 
Now returning to the coefficients of partial correlation 
Vey.2 = +529 +°109, Pye, = — 589 + 099. 

With the interpretation suggested on p. 54 for these coefficients, we are led 
to a rather suggestive conclusion. If we are dealing with a number of series 
carried out at equal intervals of time in the course of one, or even perhaps two 
days, but effectively at one epoch when comparison is made with the long range 
of nearly 70 days covered by the Bisection Series, then the correlation between 
#2 and y is positive, or the pencil mark in the later series tends to be made 
further to the observer’s right than in the earlier series; this change is in the 
same direction as the sessional change within a series. There is indeed a curious 
coincidence, on which of course no stress must be laid, 

Yry.2= + 529 +109, ry,.4= + 5294 + 0137. 

That is to say the correlation between the mean of a series and the order of 
that series when a number of series are done in close succession, is of the same 
sign and magnitude as the correlation between the mean ¢th observation and its 
order, t, in the series. But if we are dealing with all the pth series of sets which 
have been carried out on different days with varying and perhaps many days’ 
interval between, then the coefficient r,,,, is negative, or the bisection-marks on 
the later days have on the whole a tendency to move to the left of the observer ; 
this is in the direction of the secular change. 

The conclusion which it seems possible to draw is this; if a number of series 
are done at very short intervals, the interval of rest between the series will not 
be sufficient to break the effect of the sessional change; but if a considerable 
interval elapses between the carrying out of the series, then the sessional change 
in one series has no influence on the judgments in the succeeding series, but a 
quite distinct secular change may be noticeable. In the Bisection Experiment 
both secular and sessional changes are very small, but they are acting in opposite 
directions. If these two changes are due to different physiological factors, it 
seems possible that it is the fact that they are acting in opposite directions in 
the Bisection Experiment which causes them to be of so much smaller magnitude 
than in the Trisection Experiment, where they were acting in the same direction. 























Eeon S. Pearson 71 


(b) The Combination of the Series. 


For the combined series, the coefficients of correlation of successive judgments 
R, for k=1, 2...13 were calculated from 13 correlation tables each based on 
the 1000 combined observations; the results for Dz, S, and Ry are tabled below 
(Table XI). The effect of the slight sessional change is noticeable in the increasing 
values of D,. 


Using the values of D,, S,; and R, and of ,d; from Table X, Equations (vi), 
(vii) and (viii) give &(ppo,ox4,) and (0,7) for k=1, 2...14. Equations (ix) 
m 


m 
and (x) then give the values of S;’ and R,’ contained in the 5th and 6th rows of 
Table XI. The value of R,’ found by this method should be compared with that 
found with the help of the p,’s, o,’s and o's of the individual series, namely 


RY = a = + 3578 + 0186 ..... eee ee (x) bis. 
J 2@)3 (2) 
m m 
The difference which is well within the probable error arises from the fact that 
R, has been found by grouping the observations in a correlation table, while the 
pis, 7,8 and o.’s were found ‘by direct multiplication of the crude values of the 
observations. 


Another method of obtaining the R,’s is from the first difference correlation 
equations, or the method of Problem 1, p. 41; the results are given in the 
7th row of Table XI, while the constants ,R;,, the coefficients of correlation of 
successive first differences required in the solution, are in the 8th row of the 
Table. Comparing the values of R,’ found by the two methods, we find good 
agreement up to k= 6, but beyond this point the R,’’s of the second and approxi- 
mative method assume much too large negative values*. It is however evident 
from the results of the first method that R,’ does become negative, ‘and as it 
could not remain negative indefinitely as k increased, there seems here to be 
another indication that a periodic variation exists among the judgments at any 
rate in a certain number of the series. For a complete period covering from 
20 to 22 observations suggested by the % diagram, R,’ should have a minimum 
value at R, or R,’; the figures suggest that the minimum occurs somewhat 
earlier, at about R,’, but the probable errors for these small coefficients are very 
large. When time is available it would be interesting to examine further the 
significance of this periodicity. 

The points (R,, k) and (R,’, &) have been plotted in Figure 12. 

It will be noticed that the S;’’s in the later groups are larger than in the 
earlier, this suggesting again as in the case of the Trisections, that the obser- 
vations become slightly more erratic towards the end of a series. 


* This result tends to confirm the suggestion made on p. 60 that the difference correlation method 
gave too large negative values for R,” in the Trisection Experiment. 








‘9F6r0.=C'E- Dn ‘gS =8s' 





| 
6810--| F6E0- L100-+ GF00- 08Z0-—| ¢1zZ0-—| 6FIO-+, 6L00-—| 6LE0--—| O680---| TLGO. CCOP- aa “Oe 


ion 


| 9810. 
OSZL- OLOL--| LF90- O€&L0-—- Eso. LLG0.—| 6000-+| FOFO.+) O060-+] €88I-+| F9EZ. SlcE. poyjeut pug 


€1Zo- €1Z0- +) €1ZO- €1Z0-+ €1Z0- . €1Z0- +, ZO. €1Z0- +, ZIZO- 9020. +) ZOZO- L810- 
6E8E€0- GO0E0- [¢00- F0Z0O-—| €8E0- 10-—| ¥PEO-—| 6ITO- 6860: +) 6F80- 8Z8I-+ 90€Z.- 61¢8.- poyqout 4sT ay 


04000. ¥ 69000: ¥| 69000. 69000. # 89000. +| 89000. #) £9000. + 99000- #| 99000- F 99000- + | 99000. + 99000- + | 99000. 

ScorO. | LO9FO. | 99CFO. Fer. LFGFO- O6FFO. | LEPFO. | GLEFO. | SLEFO- | CCEFO. | O9€FO- — SFEFO- | FIEFO- 
| 

L810- egl0-¥| ISTO. ZSl0O-+ €810- F810: +) I810-+, GL10-+| IL10-+ ¢910-¥| €¢10-+) 9FTO- 6210. 

SIce- 1Z98€- | cgge. LI8€- 9GLe. LELE. | GO6E. CFCP- OLTFF- CCLF. GEES. 66¢¢. 6629. 


n Personal Equat 


88000- + | 88000- +| 48000- | 48000. * 48000-88000: F 88000-¥* L8000-+#) £8000. | 48000. * 28000. ¥| £8000- | 98000. #! 98000. ¥ 
O¢8g0- | 1890. | F6LGO. | ELLG0. | Z6LGO. | OFSGO. FI8GO- | 68490. | OLLGO. | E8LE0.  OLLGO- | 8FLGO- | 9OLGO. LGLCO- 
| 


tons @ 





| 
89€88-% 


99E88-% 


—_———_—__ Saar (aie - — 


| Wie nk, ‘ aE aes 


FI st | 6 F 








S 
= 
S 
a 

.S 
Lad 
s 
S 


‘(uoyoasug) saiuay paurquoy fo szunjsuog 


TX WIV 








ee 





a 





Eaon S. PEARSON 


(c) Comparison with Experiment A. 


The difference between the results of the two experiments is probably due 
to the fact that the estimation of a half is so much easier than the estimation of 





Ad . . . . 
a third. The variations in the latter observations are all on a larger scale than 


in the former ; the secular and sessional changes are very much greater, and if we 


compare the values of the fundamental constants, we find : 


























Sj’ Ss R,’ 
Trisection 0845 0732 + 6246 + °0130 
Bisection 0436 04.95 + °3519 + 0i87 
EXPERIMENT B. BISECTION. Corretation— INTERVAL DIAGRAM. 
80H! 
Ry @----@ 
R, e----e 
*TOH 
® 
‘60H 7 x as 
” ™ 
E 50} 
rT) SS 
a =e Ks 
3 -a0H 7g oS ee ee = — 8; 
a ‘ 
| 30} ‘. 
5 \ 
5 a 
20} mee 
g % 
= \ 
< 
_ \ 
w 
& 10 xg 
o ~ 
o ~e 
ss 
0 st Ss 
Ye pee Sees 
<M 
40 
20 t 
ST ASG VERRAN WEE, OE NS LT Se Cee I —_ T 
ri) 1 2 3 4 5 6 7 3 9 10 rT 2 3 


INTERVAL BETWEEN SUCCESSIVE JUDGMENTS 


Fig. 12. 


and even after the removal of the greater part of the sessional change—(the best 


fitting straight lines)—the coefficient R,” for the Trisections is + *4892, or greater 
than R,' for the Bisections. The ratio of the values of Ss, or roughly 3 to 2, is a 








74 On the Variations in Personal Equation 


measure of the relative uncertainty of the observer in making his estimate in the 
two different Experiments. 

There is some evidence for a slight periodicity in the judgments in, the 
Bisection Series; if there is any period in the Trisections it must cover at least 
26 observations, for there is no indication of a significant increase in the values of 
R, as far as calculated, i.e. up to Ry. 


VIII. Experiment C. Counrina oF 10 SEconps. REDUCTION OF 
OBSERVATIONS. 


(a) The Individual Series. 

The values of d,, o, and p, for each of the 20 series are given in Table XII as 
well as the hour and date; the means (d,) have been plotted to the order of series 
in Figure 13. 


If « is the mean in the factor e/p for a series, 
y the order of series, 
z the time in hours and fractions of an hour between 2.0 p.m. on December 
13, and the commencement of series 
we have for the regression lines, 
« —*9186 = — 006056 (y—10°5) «0... cee eee (xxxvi), 
aw — ‘9186 =— 001552 (¢— 3824) 0.0.0... .ssee(XXXVii), 





vor Personal Equation—Order of Series 
~~ «oa Mean at “9186 


4 !:—Regression x--9186= --006056 (y-10°5) 
‘98ly . ' ‘ 


‘97r 
‘96 
‘95 


tor 


94 
‘93 
“92 
‘91 


T 
fac 


ee, 
uation | in 


1 Eq 
~ (Means of Groups !)!= 


90) 
‘B9r 
‘88 
‘87 


T 


T 
rsona: 


T 


“86 
‘85 











“T a. Pe 





ee ee ey ee a cee 
1 


4 i J | 
il \ vi i} 


ea x Nill Xv 
ul IV VI vill x xi XIV 


XVII XIN 
XVI XVIII XX 
Fig. 13. 10 Second Counting. y, Order or Number of Series. 
of which the first is represented in Figure 13. 
The coefficients of correlation are, 


Pry = — 7544-065, re =— 775 4060, 7r,,=+ 977 +007, 








— te ae 


Egon S 


giving partial correlation coefficients 


Vey.2= +022 +151, 


. PEARSON 


Vaz.y = 


— ‘271 +140. 


The secular change corresponds to a gradual decrease in estimate throughout 
the course of the experiment; the value of the factor e/p for a true 10 second 


estimate would be 


ie 
102 


= 98, and this was closely approached by the means 


of the first three series, which were carried out on the first day, shortly after trial 
counts had been made with a watch. 


appears to have reached a fairly steady value at about ‘88. 


No further check with a watch was made 
during the succeeding days, and the length of estimation decreased and _ finally 


20 Series was 9186, or a count of 9°37 seconds. 


Series dy 
=f 
I ‘9786 
II 1:0140 
III *9998 
IV 9446 
Vi | 9128 
VI | -9090 
VII | 1:0070 
VIII | -9012 
IX “8886 | 
X | ‘9030 
XI | ‘9130 
XII | :8774 
XIII ‘9046 
XIV 9464 | 
XV "8880 | 
XVI 8812 
XVII | -8828 
XVIII) -8872 
XIX "8468 | 
XX "8864 


“04030 
"04331 
*03844 
*03732 
03394 
“03015 
“03981 
“02488 
*03934 
“02851 
“02982 
“01852 
“02402 
“02903 
“04162 
"04947 
“03945 
*02750 
“02486 
*03345 


TABLE XII. 
Constants of Individual Series (Counting senile 


Pi 


“0625 
“4027 
*4378 
"5437 
“3819 
*4550 
4326 
5439 
"5326 
“2850 
4894 
*HO85 
‘7589 
"8549 


FH$tttetttettetttttt 


5406 
+ °1266 
+ °6369 


| | 
log=o,N 2 ee ay * Date (1920) | Time at aed Probable 


+5283 | 
“4988 | 





6566 | 


0391 
0434 
“0526 
“0408 
“0360 
“0288 
0443 
“0260 
“0419 

)272 
*0288 
“0221 
0243 
“0288 
“0289 
‘0266 
0327 
0264 
0329 
“0285 


| 13 December! (2: .30 p.m. 
5 3.15 p.m. 
| \3. 45 p.m. 
10.15 a.m. 
| 11.20 a.m. 
12.0 noon 
2.30 p.m. 
3 3.5 p.m. 
3.35 p.m. 
10.0 a.m. 
10.35 a.m. 
|} 11.10 a.m. 


| . |e 


| ” 
| 


| 
tt December | 
| 


| 
| L 


15 December 








|} 2.30 p.m. 
.3.5 p.m. 
16 December [toa 


” 





90 a.m. 


10.30 a.m. 


Re 5 1i.5 ain. 
of 11.35 a.m. 
x 12.10 p.m. 


The mean for 


the 


Errors of 


| 
~| Coefficients of Corre- 


| lation calculated from 


50 pairs of the vari- 


| ates. 
p P.E. 
| 

“80 + 0343 

| -70 | +-0486 | 

| +60 +0610 | 
50 | +:°0715 
“40 +:0801 | 
-30 +0868 
20 +0916 
“10 | + 0944 
00 +0954 


With the interpretation of p. 54, the insignificant value of the coefficient 


Vay.z> 


The equation of the straight line best fitting the points is 
— "919 = + 0000731 (£ — 32) ............. eee (xxxvill), 


and has been drawn in Figure 14. 


suggests that for a number of series done in quick succession, there will be 
no change in personal equation; we shall therefore 
general sessional change in the series. 


not expect to find any large 


The diagram of mean sessional change is 
given in Figure 14, where % is plotted to ¢. 


Using the relations and interpretation of page 48, it is found that 


My, = "212 + 018 and V1 — 9,2 = ‘977, 








76 On the Variations in Personal Equation 


so that the mean sessional change is of even less significance than for the Bisections. 
In fact it is clear from the diagram that the regression line (xxxviii) very nearly 
coincides with the line of mean judgment, y =°919. 





'938F 5 Diagram of Mean~ 

D2 ° 4 

934+ 3.2 Sessional Change 

on ° 7' 

S oO ' 
=- 














soe” | H 2 ~~-Mlean, at “919 % 
‘soot * ‘ —Regression 9,-919= +-0000731 (¢-32) 
, The value of factor for a true 10 seconds is “980 
1 10 20 30 40 50 60 


Fig. 14. 10 Second Counting. t, Order of Observation in Series. 


The os have been found for all the individual series, and using the values 
of S/ and R,’ given below, we have for the combined series 
S; = S, V2 (1 — R,’) = 0338. 
The method of correlation of Ranks gives 
Ye, oy = +°329 + 135, 
showing again that large variation is associated with high correlation. 

In Figure 15 are given eight representative series graphs which provide a good 
illustration of the variations in judgment. In the first two graphs (I and IID), 
os is large and there are many sudden fluctuations, but in the later series gg is 
lower and very constant in value. What may be described as the smoothness 
in the change of judgment is in some cases particularly noticeable ; for example in 
the stretch between 

Ys and y;;, Series VI 
Y. and Yo, Series XII} . 
Ys and Yx, Series XV | 

In making comparison with the similar diagrams for Trisections and Bisections 
allowance must be made for the differences in scale, but I think it is clear that this 
“smoothness” or gradual variation is a special feature of the 10 second counting ; 
there is for instance no diagram of Trisections or Bisections which can compare 
with that of Series XVI of the counting, for high correlation combined with very 
gradual variation. But such a result is not unexpected, if the procedure of the 
experiment with the continuous counting be remembered. 


A further point of interest is to examine how far a sudden “break” or discon- 
tinuity in the length of estimate influences the succeeding judgments. Among the 
1000 observations forming the Groups 1 of the 20 series there are 61 “breaks” or 
differences between successive judgments of ‘07 or over (in terms of the factor e/p), 





od 





Eeon S. PEARSON 77 


Le. of over twice S;, the standard deviation of first differences. In the diagram of 
Figure 16, ten observations are represented; the break between y%_, and y% or 
Y¥t~ Ya, 1S supposed to be equal to, or greater than, ‘07. If this large break 
influences the succeeding judgments, it is to be expected that the differences 
Ytu™ Yes Yo Yt, --. ete. will be smaller on the average than the differences 


Ye Yt-r» Yu» Yr-2, «+» ete. 





SERIESI = pj=+-528 
o= 0403 A 


% *) ALI \As/ wy RR \ A \y N 
st i\~* thy 20 30 V 40 AP VS 


a | K " Ma. 
Lb rv tne Rr aN taal 
Be Na TN ee he v 


eee 
osoo 
DRO 
.. eT 








To) 
TTT T 





o,* 
O;= “On26 40 


oo 
SE 
se 





SERIES VI * = * "44 
a= -O301 


Dae 0288 palo 
ea ee .* ot ee 


‘84h A 10 . 
‘94F SERIESXIL | A= *285_ * 7 ~ 
‘92+ a= O18 


90} A o 0221 A a ih re 


oOo 
o 
ree 
a 








o 
@ 
TTTHT 











86 Vay Nee’ YY? W 

84h} fe) 20 30 40 50 60 

: ST AT T T T T ~ 
V4 SERIES XV p=+759 a 


Estimate in inches. 
co 
o 


oof G= 416 | 
6b o= “0289 / 
T Nd 























20 30 40 50 60 
< T T T T T 
‘98+ = SERIES XVI , 
Ai Pp, =+855_ 
92b 0,= 0495 
‘90F o.= 0266 . - aon 
by . Nad Prorat 
Ase VW NV 
82 ama A 10 20 30 40 50 60 
‘gate SERIES XIX 4 = *‘12 ; : : ; 
‘90 \ ” o,= «ot 
‘88F G.= 
‘86F - i 
‘B4F a gh Z a 
‘82h e 
‘80F1 * 10 
‘ga SERIES XX ' NVAM A, : 
92}- : 
‘90/- ae \A i 
‘86 L Pee” p= +637 J Vi NAA tv 
ol O,= “0BB4 
80+ g,= “0285 








The horizontal line intersecting each graph gives the mean of the first 50 observations in that series. 


Fig. 15. 10 Second Counting. Diagrams representing variations in judgment. 





78 On the Variations in Personal Equation 


In the first row of Table XIII are given the standard deviations of these 
differences taken from the 61 breaks ; now in 14 of these cases there is what may 
be called a “double break,” that is, after making one large variation to y, the 
judgment returns approximately to its previous state, both y,~ yy and ye~ Yer 
being greater than or equal to ‘07. While such cases may represent true variations 
in judgment, it is very possible that they result from some accidental error, a 
slowness in pressing the tapping key or in catching up the counting at the com- 

















oa eee 


Fig. 16. Expériment C. Effect of a large break in judgment. 
mencement of the observation, which was realised at the time and was not due to 
a real change in estimate. In the second row of the Table, therefore, are given the 
standard deviations taken from the 33 sets where there was no double break, and, 
in the third row, the standard deviations of 1st differences (taken from the whole 
1000 judgments) 
, ee 
between y, and y%4, =S;=S8,' V2(1-R,), 
- Yr and Yess =8SV2(1-R,), 
35 Ye and Yer; = §,/V/2(1-R,), ete. 
TABLE XIII. 


Stundurd Deviations of Differences between Judgment after “ Break gt a 
and the Judgments yr to Yer: 


| 
| 


pining * ssaiiaieniee ae ee ars Sere ee ee ee ae — 


5 Previous Judgments Succeeding Judgments 
| Number of . 
| ‘S Judgments ae ) ee te ee 
| 2 


Yt-6 Yt-4 4t-3 Yi-2 |) Ye-1 Yeti Yoo | Ys Visa Yt+6 
| | | 


* | a nla bE, The 5 fest | 





1 | From 61 sets 0692 | 0647) -0624| -0624| 0851] -0476| 0541) 0553) +0549) +0623 

| | + 0042 | + -0040 | + -0038 | + 0038 | + 0052 |+ -0029 | + -0033 | + 0034 + -0034 + -0038 
2 | From 33 sets 0757 | 0682-0636 | -0667| -0810} “0346 | 0421) +0508 0483 0635 
+ 0063 |+-0057 + °0053 | + -0055 | + -0067 | + 0029 | + 0035 |+-0042 + -0040 | + 0053 
| 3 | From total 1000 | -0442 | 0416 0401 | 0373} °0338] -0338| +0373 | 
} | 


I+ 
I+ 


I+ 





I+ 


‘0401 0416} °0442 | 











| 
| 


. ‘ ial caine / \ 
The probable errors are calculated from the usual expression, + °6745 o/N 2n. ; 


If we consider the values of these standard deviations together with their 
probable errors, we may say definitely that the effect of a large break or discon- 
tinuity in judgment is quite significant, and that the influence appears to last for 
at least four or five judgments. It cannot of course be decided whether the 








+6 








~~ ee ee 


See 





Eaon S. PEARSON 79 


breaks were caused by some chance external factor, or were due to a conscious 
change in estimate made by the observer on deciding, whether rightly or wrongly, 
that his second count was too short or too long*. 

It will be noticed that the standard deviations of differences between these 
special pairs of judgments are in all cases greater than the corresponding standard 
deviations from the total 1000 judgments; this is to be expected, for the judg- 
ments y; from which all the differences are taken are not a random selection of 61 
(or 33) judgments, but include many of the most erratic and therefore those 
furthest from the mean. 


(b) The Combination of the Series. 


In combining the twenty series, D,, S;, and R, were calculated from the thirteen 
correlation tables of the judgments, and the values of these constants are given in 
Table XIV below. A glance at any one of the correlation tables showed that the 
1000 judgments in any group did not follow a normal distribution, and in order 
to get a measure of this, the coefficient of skewness for the 1000 judgments in 
the combined Groups 1 (i.e. for the judgments y,, y.... Ys of the twenty series) 
was calculated from the expression 
_ VB, (82+ 3) 

2(58.— 6B, —9)’ 


where §, and £, are the fundamental ratios of the moments about the mean given by 


bs? Ms 
B=", B= H. 
a, ba? 


Skewness = 


The result was as follows: 
8, ='2726, §B.=29739, Skewness = ‘3684 + ‘0339, 
showing a very significant degree of skewness, and the frequency follows a Type I 
curve of limited range. 

The distribution of these 1000 observations made within a period of four con- 
secutive days, gives but another example of the frequent inapplicability of the © 
Normal Error Law. 

Using the values of p,, o, and o., R,’ is obtained from 


> (P1010) 
= m 
[Xe > ‘o.2) 
N m : ) =( on 


and the remaining values of Ry’, k=2,...13, by the approximate method of 


/ 
1 





= +5200 + °0156 ........ panied (x) bis, 


Problem 1, p. 41. Perhaps the chief source of error in the method is variation 
in S;, which has been assumed constant; in this experiment the range of S; is only 
1:8 Y% compared with 3°6 Y/ for the Trisections and 2°5'/ for the Bisections, and the 
results which are contained in the 6th row of Table XIV may be regarded, there- 
fore, with reasonable confidence. As before, for the higher values of k, Ry’ may be 

* Eleven definite interruptions in the ordinary routine of counting, due to a mistap on the key or a 


miscount of the 10 seconds, were recorded at the time of observation, but only three of these resulted 
in breaks of judgment >-07, the limiting value taken in the above investigation. 








‘ZG000: F SI 10110 Bfqeqord s}] *sa[qQv} WO01}B[AIIO oY} JO BBP podnosd oy} WOIy pouTeyqo sBA (Zz) ,"9 


Ua 
“Surdnoi3 ynoyyIA peureyqo Buteq %o Jo sanyva oq} bata 7 faba i UOT}BTOI Oy} WoIy pourejyqo sem (T) 9 , 





09¢0. + 6&Z0.- 980. +| 89F0-—  O810-+| OFOO-- F200. —| L9E0-—| OGIO.—| ¢Z6E- : ay si A 
| | | 


£20: — | o10-+| goo.-| glo+| gt0-+) 10+} ¢00--| 100--| 900- ~ conarautd 
1¢0- + GOL+ — O€T- +| FOL-+| 908. + agz-+| 16e-+ SIte+| gee. oneumiaes 
€1Z0- $ ‘ 1120-F 0180. F| 9080. F | ZOZO- ¥ | 1610-| 0610-F 9L10-F| g¢t0. 
8z0- + | SéII-+ SFBI-+) SI8I-+) €9ee-+) 9FLe-+) Isee-+  s91P-+] Ooze. 
| | | | sven. 





I€PEO- | ECPEO. 
6Z10-+| LZTO- IZ10- +} OZTO- 1 LL10-# GOIO-¥| OO10-¥*| ¢600:+| 6800-F 8200-¥! 9900. + 
LLZ9. c9g9- | g9gcg. 6199. 6269. | 6ZIL- 98ZL- 9crL. FFOL- ccéL- | LIgs.+ 
| 88000: + 88000. + | 88000. 88000. 00 88000: | 88000. | 28000. ¥| 18000. #| £8000. F 18000. +) 48000: + 
ZIScO.  FF8CO. | 6E8cO. |Z L¥8co-. | tesco. | O1gcd. | 6820.  Z9LcO. | IPLE0- | LELGO- | 6GLEO. 


c6- CIé6- | GIé6- | 9IZ6- LIG6- | LIZ. | 11Z6- | 80Z6- 90¢6- | G06. 8616- 9616- | c6I6- 

















= 
— 
=> 
~ 
S$ 
s 
~ 
3S 
5 
= 
Cy 
— 
~ 
A) 
= 
a) 
> 
~~ 
> 
‘—_ 
Ss 
~ 
iw 
L 
<< 
~» 
= 
oS 








‘(spuosag Buyunog) sarwvag pauquoy fo szunjsuog 
“AIX WAV 





~SRSAL go 





Egon S. PEARSON 81 


a little too low, and as a test of the amount of cumulative error which may be 
affecting R,;’, I have worked out this constant directly from the relations 


ara 1 ~~ 
R,, 8, ut > (D, - d,) (Dy, - d,,) 
Si x Sis : 


1 
Ps as fae > =— 2 3. = p= 2 
S; S,? + mn (D, d,) , Sy Si? + m * ( 4 dis) > 


R,; = 





EXPERIMENT C. 10 SECOND COUNTING. CorRELATION-INTERVAL DIAGRAM. 





‘B0H ‘ Ry o----e 
R, e----e 
‘70H 
“= 
“50H 
“40 


30 


20 





CORRELATION OF SUCCESSIVE JUDGMENTS 














-40+4 


| 
20} | 
I I LC I ol I I I I I I I - J 
ty) 4 2 3 4 5 6 7 8 . = © 12 13 
INTERVAL BETWEEN SUCCESSIVE JUDGMENTS 


Fig. 17. 














with the following results : 
R,; = — 0124+ °0213, L,,=+°632*, 
S,’ = 03426, S,, = 03444, 
R,;, and presumably R,,’ are not therefore significantly negative, and it seems 
probable that R,’ tends to zero as & increases, without oscillating about that value. 
The points (k, Ry) and (k, R,’) are plotted in Figure 17; the theoretical curve 
drawn in the diagram will be referred to in Section XI. 
* L,, or the limit to which R, approaches as R,’ tends to zero is discussed on p. 34. 
Biometrika x1v 6 


























On the Variations in Personal Equation 


IX. Experiment D. ESTIMATION OF 10 SECONDS. 
REDUCTION OF OBSERVATIONS. 

(a) The Individual Series. 

In Table XV are given the values of d (the mean of the 63 observations of a 
series, not those of Group 1 only), and of o, and p, for the individual series; the 
low values of p, will be noted at once, and also the high values of o, compared with 
those in the Counting Experiment. In Figure 19 below the means have been 
plotted to order of series, and if 

x is the mean in the factor e/p, 
y the order of series, 
z the time in hours and fractions of an hour between 10 a.m. on December 7th, 
and the commencement of series 


TABLE XV. / 
Constants of Individual Series (Estimate of Seconds). 














Series @. Jb seg p1 | Time of Start Date (1920) 
| | 
I 1151 | 1217 +1518 + 0932 | 10.45 a.m. 
|) ee et "1254 + °2332 +0902 11.30 a.m. iL. 
Ill | 1°109 1330 | -—-0249+ 0953 12.10 p.m. 7th December 
IV 1-052 "1393 +°1803 + 0923 2.0 p.m. | 
V ‘973 "1292 +2632 + ‘0888 3.0 p.m. 
Vi 1-119 1349 | +°1300+ 0938 10.15 a.m. \ 
VII 1-011 1312 + °3673 + 0825 11.0 a.m. | 
Vill 1-073 1318 | +°1631+:0929 2.0 p.m. | > 8th December 
IX 1:003 1108 | +°1976+:0917 2.30 p.m. | | 
x 1-089 0989 | +4-03804°0953 | 3.15 p.m. | 
XI 1-204 ‘1519 | +°3405+-0843 10.0 a.m. 
XII 1-204 1467 | 4+°1415+ 0935 11.0 a.m | 
XIII 1‘091 1166 | +°3241+°0854 | 12.0 midday 9th December 
XIV 1036 1059 | 4-05664-0951 | 2.0 p.m. | 
XV 1132-1884 | +4-4814+-0733 | 3.15 p.m. 
XVI 1:170 "1500 +'1036+°0944 | 10.0a.m. 
XVII 17421 "1520 — 0834 + 0947 11.0 a.m. | 
XVIII 1°300 "1591 +2314 + 0903 12.0 midday 10th December 
XIX 1243 = 1708-«| «4226040905 | 2.0 pm. | 
XX 1°170 "1833 +°1659 + 0928 | 2.45 p.m. 





Correlation between p,; and o,, Yo, p= +176 +°146 (calculated from correlation of ranks). 
we have for the regression lines 
x — 111333 = + 01018 (y — 10°5) ..... 0 eee (xxxix) 
« — 11333 = + 002493 (z — 38°62)........ es Veeveasewed (xl). 
The coefficients of correlation are 


Tee = +638 + 089, yy = +°562 + °103, 1,,=+ 983 + -005, 


ee 


a 


Ce 


— 


SS 


werer 








Egon S. PEARSON 


giving partial correlation coefficients 
xz.y = +°570 + 102, rezy..=—°470 +118. 


These latter coefficients suggest that the secular change for observations spread 
over a number of days will be a lengthening in estimation, but that, if a number 
of series are done in rapid succession, the tendency will be for a shortening; in 
fact we should expect the sessional change to be in the opposite direction to the 
secular, as for the Bisections. 





1:24 


T 


Diagram of Mean Sessjonal Change 


tor 


1°20 


T 


116 


112 


8 


So's 
ox 
«5 

Ss 
e 
« 
\ 
or 
te 
ie 
! 
r 
¢ 
\° 
@ 
» 
a 
= 
dl 
2 
¢ 
« 

. 
> 

e 

k 
e 
s 


1:08 


Meané -lac 


r20Seriesy i. 
« 


104) 


100 


y, 


for 











16 26 30 40 50 60 


Fig. 18. 10 Second Estimation. t, Order of Observation in Series. 


The values of % have been plotted in Figure 18; the best fitting line has not 
been calculated, but it would certainly correspond very closely with the mean, 
y =11333. There is in fact apparently no mean sessional change, though the drop 
in the last eight values of % may be significant, and a mark of the tendency 
suggested by the negative value of rp, 2. 

In Figure 19 the centres of the small circles represent the positions of the 
means of the 63 observations of each series; these points have been fitted with 
the cubic 
xv = 1093971 +°022116 (y—10°5) + 001174 (y—10°5)?—-0002002 (y—10°5)*...(xli), 
which is the middle of the three curves. There is evidence of a slight secular 
change, the length of the estimation increasing towards the end of the experiment, 
If however it is remembered that the 20 series were carried out in 4 days, it will 
be seen that there is in general a decrease in estimation in the course of the 5 series 
done in any one day. It is this daily drop that the coefficient rz,,.(=—°470) is 
picking out. Now in addition to the secular change in personal equation, the 
figures in Table XV suggest that there is also a secular change in standard devia- 
tion. The vertical lines on each side of the series-means in Figure 19 equal in 
length the corresponding standard deviations, or o,’s. These values of o, have been 
fitted with the cubic 

x ='129006 +:001072 (y—10°5) +:000302 (y—10°5) + 0000214 (y—10°5)*.. .(xlii), 


and the other two curves in the diagram have ordinates equal to «+2 and «—2’, 
so that the distance between the central curve and either of the outer curves, gives 
the smoothed value of the standard deviation at the point. The diagram provides 
a generalised representation of a secular change in personal equation and standard 


deviation. 


6—2 





84 On the Variations in Personal Equation 
, , ‘ : 100 
The factor for a true 10 second interval would be 1 09 7 28 and was most 


nearly approached by the means of Series V, VII and IX, while in the case of 
XVII the mean estimation nearly reached the high value of 15 seconds. 


DisTRIBUTION OF PERSONAL EQUATION IN ESTIMATING SECONDS. 





2-00; 














fa) 

z 

[o) 

O 

re) 

” 

we 

io) 

uJ 

i 

4 

2 

od 

” 

Ww 
‘75H 
| 
S04 
| 
ee Sa ET 











a I I I I I I 
o 123 4 5 678 «8 On 2 8 4 5 6 17 1% 19 20 
PLACE OF SERIES IN ORDER 
Fig. 19. 
(b) The Combination of the Series. 
In combining the twenty series, D,, S;, and R, were calculated from the thirteen 
correlation tables of the observations of the combined series. Using the correlations 
and standard deviations of the separate series, R,’ is obtained from 


= (pi01%2) 
m 





R,’= — = + 19841 + °02049 oe eeeeee. (x) bis, 
J 2(ar)= (72) 
and S,’='14101, S,’='14056. 


Then using this value of R,’, and the first difference correlation equations 
(Problem 1, p. 41), R,’ can be calculated for k=2,...12. The values of these 
quantities are given in the Table XVI below. 





mS SE Pe 





Eaon S. PEARSON” _ 85 


The fall in R, is small, and although there is considerable irregular variation 
from R, onwards, it appears that R, will not vanish as & increases, but approach a 
constant value in the neighbourhood of + °35. This can be tested; we have from 
the equations (vi) to (x) 


1 
RS) Seg + m = (D, ei d,) (Dyes ti dys) 








R; = 7 - " glpeemanmaecnes Tae (xliii), 
\/ |sy*+ ma > > (D, = ay {Stent = >> (Des ae das) 
| TABLE XVI. 
Constants of Combined Series (Estimating Seconds). 
, at 2 | 38 | 4 5 a 8 9 | 0 | un | w | 13 


1421 11440 | 1:1415, 171413} 1:1421 151423 | 11416 
“0026 
0164 
"14101 | -14056 


00213 | +-00212 


°19841 |}+°1123 |+°0652 +°05 
02049 | + 0211 2\+03 


| 
| 





I+ 


4825 | -4269 | 3965 -3913| -3764| °3755, -3983) -4045| 3488 3691, °3524| -3831| 
‘0174 +°0180 +:0181 |+-0183 | +-0183 | +-0180 +:0178 |+-0187 +-0184/+-0187 +0182 | 
| | | 


I+ 


| 
0 | +0338 | + 0332 | + 6693 | +°0798 | —-0056 | +0267 +°GO17 |+°0501 
2 | + 0213 | + 0213 , + 0212 | +°0212 | +0213 +°0213 | +0213 ' +0213 


2 





*4463 |—-0243 | — 0243 +°0094 | — -0135 | — 0229 +0160 |+ 0598 | — 0734 |+°0357 | — 0458 | 











"1749 “1749 1759, 1761] -1764| -1760| -1754| :1757| -1763)| ‘1756 | 1760 | °1767 | 
C026 | +°0027 + °0027 | + °0027 | + 0027 | + °0026 | + 0026 | + 0027 | + 0026 | + 0027 +0027 | 


| 
| 
| 
| 
| 


17 


00 


171402 | 171395 | 1°1391 | 11382 1:1378 | 171351 


27 





Ss=S,' V2 (1—B,)="1785. 
and as the sessional change for the series is very small, we may make the approxi- 
mation 


= (D, —d,) (Des — des) = 2 (D, — hy = = (Des — des) for all values of k, 


m 
and in view of the constancy of S,;, 
S,' = S'44, for all values of k. 
Then on the assumption that there is no significant periodic variation in the 
observations, 
R,; — 0 as k increases, 


1 ° 
— >(D,- dy 
M m a 
i = + °354. 
S,/27+-— =(D,-d,y 
mm m 





and from (xliii) R, > 


The correlations R, become rapidly insignificant; the values tabulated are of 
course subject to the errors of the method of approximation, but as in the case of 
the 10 second Counting Experiment, these should not be large owing to the 
constancy of S,*. The points (k, Ry) and (4, R,’) are plotted in Figure 20; the two 
curves there drawn will be referred to in Section XI below. 


* The difference between S; and Sj; is one of 1°4°/, only, 











86 


On the Variations in Personal Equation 


(c) Comparison of Experiments C and D. 
It has been found that in both the Counting and the Estimating Experiments 


there is evidence of a secular change in personal equation, and that in both cases 
the tendency is for the estimates to depart further from the true value of 10 seconds 
in the later series; in the Counting Seconds there is a decrease, in the Esti- 
mating Seconds an increase in length of estimate. There is also very little evidence 
of regular sessional change in either experiment. 


EXPERIMENT D. 10 SECOND ESTIMATING CorreLaTion—INTerRvAL DiaGRaM. 
































‘BOH 
70H Ry, @----® 
R, e----e 

‘60H 
E sol 
i 
Sal MR tae edie ‘ 
- ~-—2- ++ — = 
> ae 
” 
ra 
oO ‘30h 
2 
5 

“20H e 
zm % 
Ss \ 
< <3 
P| 
i 40H 

0 
- 40 4 
+ 
o 20H 
I I (CARES “ED se. somes ae I I I atin I 7 RS SE! 
0 1 2 3 4 >. 7 8 9 40 ai] 42 %3 
INTERVAL BETWEEN SUCCESSIVE JUDGMENTS 
Fig. 20. 


the 


Beyond this the similarity ceases; it is only necessary to compare the values of 
chief constants (defined on p. 36), 
Si’ Ss R/ 
Counting 03435 0338) = +5200 + ‘0156 
‘aitusinas 1410 ‘1785 +°1984 + 0205 


7. ae ae, |. SE 


ro 


ree 


woe 





a 


eg VR ~~ == |. Se 











Eaon S. PEARSON 87 


The variations in judgment in the Estimating Experiment are very large com- 
pared with those in the other, and at the same time there is low correlation 
between successive judgments, so that the observations will be found to be scattered 
far more nearly in accordance with the Normal Error Law than in the three 
preceding experiments. In the case of the Counting Experiment, the skew distri- 
bution of the 1000 observations has already been referred to. But for one or two 
exceptions (as III and XIX) the individual series in the Counting conform more 
closely to a general type than in the Trisections or Bisections, and this results in 
the very smooth values of the constants R; and R,’. 


X. EXPERIMENT #. PLATE MEASUREMENTS WITH ZEISS COMPARATOR. 


The values of p, only have been calculated ; these, with o, and a brief description 
of the nature of the marking measured, are given in the Table XVII; o, is in 
millimetres. Series I—VIII involved settings of both slide and micrometer, IX of 
micrometer only. 


No great weight can be attached to the result of one series of 50 readings on a 
marking, but it is justifiable to draw certain conclusions from the results of the 
eight series. In the first place, there appears to be a significant correlation between 
the successive measures of the edge of a band (I and II), but in measuring the 
centres, i.e. in bisecting a bright maximum with the cross wire, there is on the 
whole no correlation. This perhaps might be expected; the edges of bands or 
maxima in photographic spectra are not quite sharply cut, so that some uncertainty 
must exist in the observer’s mind as to where the real edge should be taken to be ; 
his opinion on this point may vary throughout the course of the sitting, and con- 
sequently correlation will be found between the successive readings. On the other 


TABLE XVIL 








Series | pi a Description of marking | 
| a —— — 
; 
I + '384+ 081 | ‘0016 | Sharp edge of bright band ) ——— 
idge | 


II + °467+°075 | -0015 | Slightly vaguer edge than Ij 


x 








| Jil | +°117+°094 | -0007 | Clear and narrow maximum 
ie + 090+ 7095 | -0012 “s = ts | 
| V +021 + -095 | -0016 . ss z Centre 
VI —-001+°095 | 0019 | Broad and obscure m : ‘ 
Vil — "050+ °095 | -0022 a 5 soft & 
VIII | +°227+°091 | 0041 im sf nce e ) 
IX | +:288+-087 | -0004 | Micrometer screw settings only 








hand, in the bisection of a narrow maximum, there will be little doubt as to the 
position of the centre; the real estimate of the observer will vary but slightly, and 
the variations in the reading will be due mainly to failure in breaking off the push 











88 On the Variations in Personal Equation 


or pull of the slide at the right moment. It is possible that unconscious “over-pulls” 
or “under-pulls” may go in runs together, but the measures'seem to show that 
this is not the case, and that the correlation of successive judgments is due rather 
to correlated changes of mental estimate than to those of a more physical character. 
If it were more difficult to bisect a maximum, if there were greater opportunity for 
variation, it is probable that there would be a correlation of successive judgments, 
and this is perhaps illustrated by the case of Series VIII, which has the largest 
standard deviation (0041) and also a correlation (p, = + ‘227 + ‘091) possibly 
significant. 


The result of IX suggests that there is a correlation between successive settings 
of the micrometer wires in the second eyepiece; this correlation would of course 
enter into the results of I—VIII, but the standard deviation of TX (0004) is so 
small that the effect will be insignificant where the variations in slide settings are 
large. 

As a matter of practical application these results serve to emphasise the 
importance of the routine of measurement usually adopted ; if, for example, it is 
proposed to take four readings of each of a number of markings on a plate, the four 
readings should not be made in succession, but all the markings should be measured 
once, and then perhaps a short interval taken before the second measuring is made, 
and so on. This method should eliminate the error in the mean of several measure- 
ments of a marking, which may arise from a correlation of successive judgments, as 
well as errors due to change in temperature of instrument or plate, ete. 


XI. ANALYSIS OF THE CORRELATION BETWEEN SUCCESSIVE JUDGMENTS. 


(a) The Theory of correlated Estimates and accidental Errors. 


It has been seen that in the case of the Bisection and Timing Experiments 
when the secular term was removed the coefficients of correlation of the successive 
judgments, or the constants R,’, diminished to approximately zero values as k, the 
interval between the judgments correlated, was increased. In the Trisection Ex- 
periment, owing to the marked sessional change which was repeated in practically 
all the series, Ry’ appeared to approach a value of +°16 and not zero as k was 
increased ; the sessional change in this case appeared to be of parabolic rather than 
linear form, and it seemed possible that if the ordinates of the “best” fitting 
parabola of each series were removed from the observations, the coefficients of corre- 
lation of the residuals, or the R,’’’s, would tend to zero as k increased, as in the 
case of the other three experiments in which there was no large sessional change. 
The points representing the values of R,’ which have been plotted in Figures 8, 
12, 17 and 20 appear on the whole to lie so nearly on a smooth curve, that it is of 
no little interest to inquire whether we can obtain equations to such curves based 
on some definite theory of the physiological factors underlying the variations in an 
observer’s judgment. 








Oe ag TEE er ee 











SO a EE ree ar ya 





Egon S. Pearson 89 


In the first place we have seen that neither a secular change in personal equa- 
tion—the variation in series means—nor a simple sessional change such as that 
represented by the straight line or by a second order parabola considered in the 
Trisection Experiment, will account for the whole of the correlation of successive 
judgments. We must therefore conclude that quite apart from the large scale vari- 
ations in judgment which are due to the more gradual changes of state in the 
observer resulting, perhaps, from experience or fatigue, there is a definite relationship 
between the small scale variations in judgment; if judgment y is greater than the 
average of the five or six preceding judgments, then we shall on the whole 
expect that y;4,, the next judgment, will also be greater. I propose therefore to 
consider what results will follow from the assumption that y, has a correlation r 
with y_, and y;4,, but that for y,, or y, constant it has no partial correlation with 
Yt-2 and Yz42 or judgments at greater intervals. In other words we will suppose 
that the observer's estimation at any moment is only influenced by the preceding 
estimation, and only through this, and not directly, by the earlier estimations. 

Let us take the successive judgments y, Yri1, Yero +++ Yer --- and suppose that 
the total correlation between y, and y;;x is py, where k=1, 2, 3,..., and p,=r. If 
there is no partial correlation between y and Ys, Yer, being constant we must 
have 

po— pr =0 or p,=7". 
In the same way if there is no partial correlation between y and y%,,; when y,, 
(or ¥r4s) 18 constant, 

Ps — Pip2= 0 or p=", 
and in general we find that 


pis Sebeliulhabemeeideaes iastsbes ....(xliv). 


In reaching this simple result there is a point however that has been overlooked ; 
it has been assumed that there is some physiological or psychological significance 
in the correlation of an estimate of a quantity and in the preceding estimate, but 
it must be remembered that the value which the observer records may not be 
exactly that which he wished to record, or in other words he may be unable to 
record his true estimate. Thus in bisecting a line it is likely that the pencil point 
will not strike ‘the paper exactly at the spot intended, or in counting 10 seconds 
the tapping of the key may not be exactly synchronised with the beginning or end 
of the count, and there may be many other little external influences of which the 
observer is unaware, which will all combine to form what may be termed an acci- 
dental error superimposed upon the true correlated estimation. Let us examine 
how the relation (xliv) will be modified by introducing the idea of these accidental 
and uncorrelated errors; we must suppose that the observer's recorded judgment 
Yt 1S made up of two parts, a his actual estimate at the moment of record and £; 
some complex of accidental errors affecting his record. Then 


Ut = A + Bi © 0:64:9:0 6-0-010's 60:0 0'0in 00540 Ws vie eeben titan .(xlv). 


Now if we assume that the accidental errors ®; are as like to be positive as 
negative, and that they will not be correlated in any manner among themselves 








90 On the Variations in Personal Equation 


nor with the fundamental part of the judgment a, we shall have the following 
approximate relations 


N i rs " ar 

J where JN is large compared 

SBuz =O for k=1,2,8,..., 8 P 

1 with & 

N 

VB Birk - 0 »” ” \ ee ..(xlvi). 
1 

32 eee where k and k’ take any of the 

—Pt+ekFot+k = 

—" “4 ‘ values 1, 2, 3,... ete. 





But 


the correlation between successive values of the y's at intervals of k is 


3 (ae + Be) (Arse + Bese) — NS Sut Be § S + Ok + Bese 
tar Ny N 


pr= —— Q - 
VEararn(3 Fe 13 (eure + Bin)? W (3 Sst Py 
f=] t=1 








> 
Ste t=1 A 
N V N 
A ay Ask 
SN ] * t _ t+k 
% (aete+4) — N= Vv > N 
ae t= t=1 LV ¢=1 
N N 2 N N \2 N 
4 P « @ * ) (2 + Atik ~ 
S 92 re se 2\ 2 t+k 2 
VIE -¥(3,) + 220 |S crt (3) + 2 Bi 
t=1 t=1V/ tai t=1 t=1 / t=1 


in view of the relations (xlvi) 
= [oars] 


JQ? a B,*) (Ay41" + Br?) 


where [a;a,4] is the first order product moment coefticient referred to mean of the 
successive a’s at intervals of i, 


and Va@,2 is the standard deviation of a, am, 
ir? 
VBP 9 
and Jae +E 


» Ok+Ny 
i » Bey Bevis +++ Berns 


” ” ” Yk> Ykriy see YE+LN: 
Now unless there is a steady sessional change in the a’s, we may assume that 

for large values of V 

o=...=47=...= a", say, 

and similarly unless the accidental errors are steadily increasing 


or decreasing in 
magnitude 


Or Or sk a? Ap Ar+k a? 
and we have Pe = La te] = = Jf : i+ = ——= .Te,,0542° 
e+e? @+f? a a? + B? 


But on the assumption made above of zero partial correlation between two 
estimates which are not consecutive, we have found that Tas, 0548? the correlation 
between the observer's real estimates at intervals of k, can be expressed in the 
form r*, and therefore 





— 


a a = 





7a 


—_- 


oS 


REM is 





Keon S. PEARSON 91 
where g is a constant not depending on the interval &. With this expression for 
the correlation we shall of course find an apparent partial correlation between the 
judgments at intervals greater than one; for example the partial correlation 
qd-qg” 
7 oe 


between y and Y¢40, Yry. being constant, is aya? and does not vanish unless 


q=1. According to the theory suggested this is however a spurious correlation 
due solely to the presence of the accidental errors. 

The next problem is to inquire how far a relation of the type of (xlvii) 
will fit the correlation coefficients which have been calculated for the Experiments 
A, B, C and D. In the first place, in order to get as smooth values for the 
coefficients as possible we must combine the 20 series, which we may do if we 
remove the secular change as represented by the variation in the series means ; 
this step is clearly necessary for we are considering the relationship between 
judgments made in close proximity and are not concerned for the moment with 
the variation in personal equation from day to day. We must therefore deal with 
the coefficients of correlation R,’ and endeavour to fit a curve z= qr* through the 
points «=k, z=R,’. I will consider the different experiments in turn. 

(b) Application of Theory to results of Experiments. 

Kaperiment A, 

The curve represented by z = qr* is asymptotic to the # axis (as r <1), so that 
if it is to fit the points (&, R,;’) it is necessary that R,’ should tend to zero as k 
increases. But the values of R,’ given in Table V, p. 58, appear to tend as k 
increases, to a limiting value between 4+°16 and +°15 rather than to zero. 
I think that this results from the marked sessional changes which have been 
represented in mean form by a second order parabola (see Equation (xxx) and 
Figure 4), and that if there is a physiological significance in the distinction 
between the sessional change and the residual variations of the observations when 
freed from this change, it will be of interest to find out how the coefficients of 
correlation of these successive residuals—what have been termed the R,”’s—fall 
off as the interval or k& is increased. Should it be found that the R,”’s follow 
the law 

R,' = qr", 
the argument in favour of distinguishing the sessional change from the residual 
variations will be strengthened. 

It was found that the values of R,’ given in Table V could be fitted closely by 
a curve of the form 

OS ge ee ee (xlviii), 
where p, q and 7 are constants. 

A rough trial gave the following approximate values: 

= 157, q= 69, r,=°73. 


Now if z=f (p,q, 7r) 
= Fi . fo cto . ofo ng ~der 
=f (pos Go» %) + Sp ap, 30, +8 or, to first order, 


= Po 7 gore + dp + rk dq x kqrF dr, 


+ dq of 








92 On the Variations in Personal Equation 


we have as equations of condition for a least square solution 


Sp + rok bq + kqorok Sr = Ry’ — po — Gore", for k = 1, 2,... 13. 


Using the values of p,q and 7 given above, the corrections dp, 8g and ér 


were calculated and gave finally as the best fitting numerical equation, 
Ry = 1524 + °6817 ("7105)* ............0csererceees (xlix). 


TABLE XVIIL 
Values of the R,’s for Trisection Experiments. 








l 2 3 4 5 6 | 8 
| , | . ; : Values obtained from | - 
| R;. Ry | Difference | Probable (lii) on assumption of | Rx 
| k | (direct | (from equation | Col. 2— | rror of constancy of G, (from equation 
calculation) (xlix)) Col.3 | RB,’ seat aca oa (Ivi)) | 
| | | gs R,.’ 
| eee tas tee 
| 0 as: |, - eae ee Se a a +804 
1 + 625 637 — ‘O12 +°013 ‘0773 +°550 | 571 

2 523 “497 + 026 | +016 ‘O776 431 “406 

3 “388 *B97 .-*009 | +°018 ‘0778 | ‘268 288 

4 “B15 *326 -'011 +°019 ‘0781 183 *205 | 
| 5 281 | ‘276 + 005 | + 020 ‘0778 142 ‘146 | 
a 232 "240 —008 | +°020 | ‘0782 084 “103 
Po 222 215 +007 | +020 | -0782 071 074 
ee 191 ‘197 — ‘006 + °021 | 0783 ‘035 “052 
| 9 165 184 —*019 | £021 | +0787 006 | 037 
| 10 183 | 175 +008 | +°021 ‘0802 | ‘O31 026 

1] 168 | ‘168 ‘000 | +°021 | 0823 | ‘O17 ‘019 

| 12 i> an "164 +°008 | +°021 | °0834 023 013 

3 | +160 +160 ‘000 | +021 | -0840 | +-009 | +4009 

SS ae a a Ca eS OE eee os 
| | 








In the second column of Table XVIII are given the values of R,;’ taken from 
Table V and in the fifth column their probable errors; the values of R,’ given by 
equation (xlix) are in the third column, and in the fourth are the differences 
col. 2—col. 3. It will be seen that the fit is a good one, the difference being only 
greater than the probable error in the case of R,’. The points (4, R,’) and the 
curve of (xlix) are shown in Figure 8 (p. 64). 


The problem before us is therefore this; can we explain the constant p in 
equation (xlviii) in terms of the sessional changes? We have seen that the mean 
sessional change for the 20 series can be represented by a parabola of the second 
order, but we must allow for a different change in each series. Let us suppose that 


y =f (b) 
will represent the sessional change in the pth Series after the secular term 
represented by the series mean has been removed, so that instead of equation 
(xlv) of p. 89, we have 


He = fe (E) +O, + Bw Sy (ODA Fy oc ccccrcscccscsesesscccsess (1), 
where Y,=a, + B;. 





LN 5: LT 


<r o. 








re oe 


ee 


or 


Te 





R, 


Egon S. PEARSON 93 


Then if = indicates summation for the m (or 20) series, n = 50, the number 
m 
of observations in each group of a series, and & takes any of the group numbers 
n 
1, 2,... 14, since y = f, (¢) will be the “best” fitting curve of its type & Yi4.,.=0 
t=1 
approximately, and on combining the m series 
ss (Vise) = 7; 
m t=1 
Again we have no reason to suppose that there will be any correlation between 
the sessional term /, (¢) and the residual Y;, so that 


SE (Vuwsfy(t+h —1)} =0 


for all values of k and k’ between 1 and 14. 
As y; is freed from the secular term, using the relations above we have that 


= 2 (fp (t) + (So (t+k)+ Visa} —mn& > g tel s 3 3 Un} 


m t=1 { mn m t=1 





V/[z3 > e+ y,y —mn {s 3 S LOY | 23 = (fp (t+k)+ Vise)? —mnjx 3 = = Sp(t+k) 


mt=1 Mn m t= mn 


kitten arin o to RO neie™: esr 
R; S, "Skea ‘+ F, 





NS)? + G2) (Stan + Gr’)? 


where R,” is the coefficient of correlation between Y; and V;4,, 8," and S;4,.” are 
the standard deviations of the Y’s of Groups 1 and &+1 (see (xi) and (xii) on 


page 35), and 
5 ~ . : t Pp t k 
af sis . . 4 2; t= 3 fi oe ) 


man = m t= m t=1 ™n | 
1 fy (t+ k—-1))? ; 
2=— 3% (f,(t+k—-1)}- » 
Gre MN m Pag 1 fs ( ° dj {x § 1 mn 


It will be seen that G, is the standard deviation of the ordinates of the curves 
representing the sessional _ y =f, (#), which correspond to the observations 
in the kth groups, while G. Gi, — is the correlation of these successive ordinates at 

k+1 
intervals of k. If the sessional changes were linear this correlation would be unity, 
and a little consideration will show that if the sessional change in each series can 
be represented by a curve of gradual bend, the correlation will not be far from 
this value. For example in the case of the parabola (Equation (xxx), p. 47) 
which was fitted to the mean sessional change and is drawn in Figure 4, it is 
found that 


FE; 
—____ = +. ‘994, 
— * 
We shall therefore make no great error in assuming that 
F, — G, Giz: > 


| 








94 On the Variations in Personal Equation 


and it follows that the relation for R,’ can be expressed in the form 





a — Tat, = Bre Ba ii) 
V+ ee) (0+ Ges) V+ 85) (+ a) 
= perth, Ry”, 
which must be compared with the relation 
i MG caccanenscccdecsinscend (xlviii) bis, 
where p='1524, q=°'6817, r=-7105, 


that has been found empirically to fit the actual values of R,’. 

If the expressions p,; and J, were constant for /= 1, 2... 14 an interpretation 
of (lii) would be at once suggested. Namely that R,”, the coefficient of correlation 
of the successive residuals Y; and Y,,4, left after the removal of the secular and 


sessional changes is expressible in the form 


0 gS ORS Ea EY (iii), 


that is to say, making allowance for the presence of accidental errors, the law of 


relationship between the successive estimates suggested on p. 90 above, holds 
good. Now without finding the curve which represerits the sessional change in 
each series we do not know the values of S;” and G,. We have however that 
ee Rl Qh eeta as arcsdacawic ocevaeateeoaesed (liv), 

where S;’ is the standard deviation of the observations in the kth groups after the 
removal of the secular term. The values of S;' are given in Table V, p. 58; 
they are seen to increase as /: increases and therefore p, and /, can only be 
constant for all values of / if 


a. SS. ‘Se ete (lv). 
G2 Ge G2 } 


That the relations (lv) should hold approximately is not at all improbable ; for 
with a sessional change of the parabolic form of the curve (xxx) illustrated in 
Figure 4, the standard deviations of the ordinates in the later groups will increase 
owing to the increasing drop of the curve towards the end of the series while S;’ 
may increase with / owing to greater variation towards the end of a session. 

In fact for this particular mean series with its sessional curve represented by 
(xxx) it is found that 


ll 


G, ='0336 ins., G,, =°0406 ins., 
while S,” = ‘0165 ins., S,,’ =:0201 ins., 
that is to say, the variations superimposed upon the main sessional change (the 
distances of the points plotted in Figure 4 from the parabola) become greater 
towards the end of the series when the observer’s judgment perhaps became more 


yr \ 
. ; a VS D1. ™ . 
erratic as he grew tired. These values give a = °49, a= ‘50 suggesting that 
Vi4 : 


the relations (lv) do hold very closely. What we find therefore in this typical mean 





¥ 
; 


ur 





ep SE ee 


~ 


oe 











Eaon S. PEARSON 95 


series represented by Figure 4 may well be expected to hold approximately in the 
individual series. 

If then p, is constant for k= 1, 2,... 14 and equals p, we find readily from 
equations (lii) and (lv) that 


and hence (l—p) RR,” =q* or R,’ 


Making use of the numerical values p='1524, q =°6817, r= "7105 we obtain 

finally 
Ma = BOGS (TGS 0.0005 5000scccenseens euekades (Ivi), 
as the theoretical expression for the correlation of the successive residuals after 
the observations have been freed from secular and sessional change. This curve is 
the lower of the two curves drawn in Figure 8. The points which are there 
plotted about this curve are the points (/, R,”)* obtained from equation (Iii) 
(a) On the assumption that G? = G=... = GZ = constant, 
1 


(b) S12 yy s/o 
v(! + Ge) (1+ Ga) 
G; Ge 


(c) Making use of equations (liv) and the tabled values of S;’. 





= pu = 1524, 


The close fit of the curve to these points shows that the manner in which the 
values of R,” fall off as & increases is not much atfected by the different assump- 
tions regarding the relations of the S;”’s and the G;’s made in the two cases f. 

Experiment B. 

Reference has been made on p. 71 to evidence for a slight periodicity in the 
observations of this Experiment, which gives rise to small but apparently signi- 
ficant negative values to R,’, for kS7. Further investigation might enable a 
correction for this periodicity to be made, but at present it is not possible to 
express R,” with exactness in the form 

R, = qi". 

For the purpose of comparison with the other experiments we can however 
obtain values of qg and 7 which will give a rough fit for the first few values of R,’. 
Thus if we take 

y= 72, g=‘47f, 
we get the values 
R,’ ='34, R, =°24, R,='18, R, ='138, 
which agree roughly with the actual values given in Table XI, namely 
RB,’ = ‘352, RB, ='231, R, ='183, RB,’ = 085. 

* In Figure 8 these points have been indicated by R,’” to distinguish them from the correlation 

coefficients of residuals after removal of linear sessional change, there denoted by R,”. 


+ The values of R,” calculated from equation (lvi) and of R,” and S,” calculated on the assumption 
of the constancy of G; are given in the 8th, 7th and 6th columns of Table XVIII. 


. Be ae 
t r=-72 is the value of the mean of the ratios = 


7) my and an , and using this value for r, 
R, Ro Rs; 


q was taken as ‘47 by rough trial. 








96 On the Variations in Personal Equation 


Experiment C. 

At the end of the section dealing with the reduction of the observations 
for this experiment the conclusion reached was that R,.” and R,; were not 
significantly negative ; no difficulty therefore arises in fitting a curve of the form 
y=qr* to the values of R,; given in the 6th row of Table XIV, p. 80. This 
was effected by the method of least squares, with the result 

ee a, | | ea ne (Ivii). 

In the 7th row of Table XIV are given the values of R,’ calculated from this 

equation, and in the 8th row the differences 
(R,’ from observations) —(R,’ from curve). 

If these differences are compared with the probable errors of R,’, it will be 
seen that the fit is very satisfactory, for the later calculated values of R,’ are in 
any case uncertain; R,’ and R,,’ were indeed not used in the least square 
solution as they were known to have too high negative values. 

Experiment D. 

On p. 85 it was suggested that R, would approach the value +354 as 
increased. In this case a curve of the form 

Ry; = 354 + qr*, 
was fitted to the calculated values of R;. The fitting was carried out by moments. 
Making R, — ‘354 = z, we have 


s — 7 : ° 
S(z) =qr i a N, say, where s is the number of ordinates, or a 
1 =e 
S (ck) =q(r+ 27? + 3r? +... +87) =D x p, | 
1 
? 1 srs a 
whence eal hee, lee CECE (Iviii), 


and is the distance of the mean from an origin at unit distance from the first 
ordinate qr, 


The constants y,’ and N are known; solving (lviii) by approximation we have r, 
and then (lix) gives q. 


The values are q= 1153) 
yr ='8121)’ 
and finally, a lk | |) rn ene ee (Ix). 


Then using the approximate relation 


as, 
8/2 +—3(D,- dy 
mM m 





R,’ = (R, — 354) x 


—which is a modified form of Equation (xliii)—we obtain for R,’ the equation 


My > 1IGE CGIIIP nid cessive cdice (Ixi). 








ee oe 











+ —— 


ae 





Egon S. PkARSON 97 


Both of the curves, represented by equation (Ix) and (xi), have been drawn in 
Figure 20, and show a satisfactory fit, if the roughness of the data is taken into 
account, 


The results of the Trisection and of the Ten-second Counting Experiments, and 
as far as the rough form of the data will allow, of the Ten-second Estimating 
Experiment, suggest therefore that there is some foundation for the theory of 
relationship between successive estimates put forward at the beginning of the 
present Section. To reach the expression qgr* for the correlation of successive 
judgments at intervals of k, it has been necessary in all cases to remove the 
secular change, and in one case a sessional change as well, but if these changes 
correspond in themselves to some definite mental or physical processes which can 
be separated in some degree from the causes underlying the residual variations, 
then we are justified in inquiring into the significance of the constants q and +. 
It has been suggested that 


Ne ee PN 


so that q is dependent on the ratio between the correlated and the uncorrelated 
parts of the observer’s judgment, that is between what I have considered as the 
true estimate and the accidental errors superimposed in the process of record. 
Now using (Ixii) and the relation* 


(or S” for the Trisections where it has been necessary to allow for a sessional 
change), we find that 


Vat=VqS', VBP =V(1—@)S" ....ccceessceeeseeeee(IXiv), 
ie 1S [x Sn) Tae : 
and the values calculated in this way for /a@ and V8? are given in Table XIX. 


TABLE XIX. 





| Ten-second Estimating a 18 


| ~ |: 3 | 
| Experiment q | S' | Va? | VB | 7 | 
| | | a Sn cE 
| hx | ras 
| ‘Trisection 6 ae = ‘80. | 080(=S”)ininches | ‘O71 | 036 | “71 | 
| Bisection (approximate only)... | “47 | ‘045 in inches |} “O31 | 033 | “72 | 
| Ten-second Counting ... rae C7 | ‘034 in factor | 028 | 020 | “79 | 
| 
| 


‘141 in factor | 060 | 128 | ‘81 





If the Trisection and Bisection results are compared it will be seen that the 
standard deviations of the accidental errors (J 8?) are nearly the same but that 
there is a large difference between the measures of the variations of the true 


* It will be seen that owing to a sessional change in standard deviation, 5,” for the Trisections 
(Table XVIII) and S,’ for the Bisections (Table XI) increase with k. To obtain an approximate value 
for the standard deviation of the whole 1200 observations as opposed to that for the 1000 observations 
of any particular Group i, I have used in equations (Ixiii) and (Ixiv) S’ (or S”) given by 

S’2= 4h (S124 Sol? + So + ... + S14). 


Biometrika xiv . 








98 On the Variations in Personal Equation 


estimates (Je). This is a result which we should anticipate, for the method of 
recording the estimate was the same in each experiment, and accidental errors of 
the same magnitude would occur in both cases; on the other hand the observer 
was faced with a more difficult problem in estimating a third than in estimating 
a half, and this is shown by the greater variability of his estimate in the former 
case (‘07 against 03)*. For the Timing experiments, we find no correspondence 
between the V's; the great difference between the counting of ten seconds and 
the attempted concentration of mind on the passing of an unbroken ten second 
interval has been emphasized in the description of the experiments above, and a 
correspondence was hardly to be expected. The standard deviations are in terms 
of the factors e/p and must be multiplied by 10:2 if required in seconds. 


If now we turn to the values of r given in the last column of Table XIX, it 
will be seen that they lie near together, and although that for the Bisections 
is not an exact measure, there is a suggestion of close agreement between the r’s 
in the pairs of similar experiments, for we have estimations of length with ‘71 
and ‘72, and estimations of time with ‘79 and ‘81. This coefficient is a measure of 
the rate at which the correlation of successive judgments falls off or the influence 
of previous estimates vanishes from the observer's mind: on the theory of zero 
partial correlation it is simply the coefficient of correlation between a true estimate 
freed from accidental errors and the preceding estimate. 

On any theory 7 would seem to be a fundamental constant not varying greatly 
for different types of observations, but perhaps varying considerably for different 
observers. The fact that it is so nearly the same for experiments with a five second 
interval between observations (Trisection and Bisection) and for others with an 
interval of ten seconds or more (Counting and Estimating) shows that the corre- 
lation of successive judgments is a function not only of the time interval between 
two judgments but also of the number of intervening judgments. For if it were 
purely a function of the time interval we should expect to find a greater differ- 
-ence between the values of 7 found for experiments with a five second interval and 
a ten second interval. Indeed if the experiments were exactly the same but for 
difference in interval, R,’ for that with ten seconds would equal R,’ for that with 
five seconds. Further experiments of the same type in which the interval between 
the recording of judgments was varied would undoubtedly throw much light on 
this point. 


XII. PREDICTION. 


If the values of “m” successive judgments are known and there is no corre- 
lation between them, the “most probable” value of the (m+ 1)th judgment, that is 
the most reasonable guess at its value that can be made, is the mean of the “m” 
judgments. If however the successive judgments are correlated, then it is possible 
to predict the value of the (m+ 1)th with much greater expectation of accuracy. 


* This may be compared with the ratio of 3 to 2 given on p. 73 from a comparison of the 
S,’s before making any allowance for the accidental errors. 











at ag Fs 


TC 








ee 





eee, Vee 


ne ST 





Eaon S. PEARSON 99 


In the Experiments B, C and D it has been found that the correlation between 


judgments at intervals of /, made in the same session, can be expressed approxi- 


mately in the form 
Big Qe 6505005580 ac semen fol Pa 


while for Experiment A, owing to the large sessional change, the expression was 
Ry =p + Qt .......5. pisig sae icdegensaeee 


The decrease of correlation in geometrical progression expressed by (lxv) 
follows precisely the law of ancestral heredity, for which the multiple regression 
equations required for prediction have already been worked out*. It is not 
therefore proposed to go further into the problem in the present Paper, nor to 
inquire whether the general multiple regression equations would reduce to as 
simple a form when the correlation is expressed by equation (Ixvi) rather 
than (Ixv). 


XIII. Summary AND CONCLUSIONS. 


The secular change in personal equation is shown by the variation in the series’ 
means, but it is only in Experiment A and perhaps Experiment C, where the general 
trend of the variations is markedly in one direction, that we find that type of 
change which is usually understood when a secular change is referred to. In the 
Bisection Experiment B the linear secular change is very small and its existence 
might well not be recognized, and yet the series’ means are subject to fluctuations 
far exceeding those of random sampling. For the probable error of the mean of a 
series (or of the observations in Group 1) is 
Sy 
V50 
but if we take the distribution consisting of the 20 series means, d,, we find that 
the standard deviation is 037375, giving for the probable error of a mean d, 


+ 02521, 


+ 67449 x " = + 00416, 


which is more than six times as large as the probable error we have calculated by 
considering the variations within a series. It is therefore clear that the 50 
observations in a series are not random samples of the whole “universe” of 
observations, as they should be on the Gaussian hypothesis of normal errors. 

It is again only in Experiment A that there is a fairly consistent sessional 
change from series to series which an observer might easily recognize and possibly 
allow for, and yet if we turn to any of the graphs for the Bisection or Seconds- 
counting which show the variations of judgment within a series (Figures 11 and 15), 
it will be seen how very often the mean of ten consecutive judgments will give but 
& poor approximation to the mean of the series; we cannot take the judgments 
within one series as scattered at random. When dealing with a sample of m 


* The Galton-Pearson Law of Ancestral Heredity; the offspring and the mean of the Ath grand- 


parents have qr* for their correlation. 
7—2 








100 On the Variations in Personal Equation 


correlated variates, the usual expression for the probable error of the mean is 
° (1 Sei r) : WA om ° 
(1) + 67449 = “Om, as compared with (2) + ‘67449 —*, when the variates 
Vm m 

are not correlated, but owing to the sessional variations to which a large part of 
the correlation is due, the expression (1) being the smaller, is in the present case 
a worse measure than (2), of the probable limits of divergence of the mean of the 
sample from the mean of the series. The graphs of Figures 6, 11 and 15 show 
that there is a tendency for the judgments to vary in waves, to be first on one 
side of the mean for the series, and then to change to the other, but with no 
definite period of variation. It is owing to these large correlated variations which 
cannot be expressed in any simple sessional term, that the coefficients of corre- 
lation, 7p,,¢,, between o, and p, have been found to have positive values ranging 
from +°52+°11 in Experiment A to +°18+°15 in D, showing that greater 
variation is associated with higher correlation of successive judgments. 

An analysis has suggested that the coefficients of correlation of the crude values 
of the observations at intervals of k can be expressed in the generalized form 


” Mm” mn 1 
S, Shai R, bi F, + m p> (D, ae dy) (Desa Sei dy.1) 





R, = 
OF fo Y¢ ] 2 Io 2 1 > r ° 
/ 1s 24 G24 - = (D, — d, r {Seu ih Ges ‘wit mat Pes =— die¥t 


dew andes ..(Ixvii), 
where 


> (D,— dy) (Desi — des), & (Di — d)? ete. are terms representing the secular change, 
™ ™m 


F;, and Gi, are functions of the sessional change, and 


R,.” and S;” are the correlation coefficients and standard deviations of the residuals 
left after secular and sessional changes have been removed. 


In two experiments it has been found that R, is greater than +°80, which shows 
clearly that the estimates have not been distributed randomly in time. 

The coefficients Ry” appear to fall off in geometrical progression, and to be 
closely represented by expressions of the form q7*, in which q and r are constant 
for any experiment; it has been found that the introduction of the quantities F 
and G@ in equation (Ixvii) in addition to the secular terms, is only necessary if there 
is a significant sessional change which repeats itself in series after series. Thus in 
Experiment OC, where there was no such change, R; could be expressed by the 
relation 


v/a 1 
qr*S, S ki + pm = (D, = d,) (Des — dps) 
k= EE —_—— : .. (xviii). 
9,/2 + - D, — d,)?} 3S’ ne? + — 3S (Dear — den? 
J's +7, = 1 ay} | k+ +o at k+1 k-+1) 
A tentative interpretation has been given to the results of this analysis. The 


observations in Experiment A suggested that there was some physiological signi- 
ficance in the distinction between the secular and sessional changes, and this was 








7 


Sr 


a id 








a 





Keon 8. PEARSON _ - 101 


confirmed in Experiment B, where it was found that there was evidence of a linear 
sessional change acting in the opposite direction to the secular change. A discussion 
of the values of the partial correlation coefficients 7,,., (personal equation and order, 
time constant) and rz, , (personal equation and time, order constant) suggested that 
if the interval between the successive series were made very short, it might not be 
sufficient to break the effect of the sessional change. The correlated variations 
which have been found to follow the law R,’=qr*, have been considered as in some 
way separate from and superimposed upon the other more steady changes. Starting 
from the tentative assumption that there is little or no partial correlation between 
the observer's true estimates at intervals greater than one—that is to say that the 
observer's judgment at any moment is only influenced by the judgment immediately 
preceding, and only through this and not directly by the earlier judgments—it has 
been shown that the constant q in the relation 
ee sinhdasesceweocewenctea ae 

can be accounted for by the presence of uncorrelated accidental errors which are 
superimposed on the correlated variations in the observer's true estimate. Without 
further investigation it would be difficult to distinguish between what may perhaps 
be termed the physiological and the psychological factors ; in the experiments that 
have been undertaken the variations in recorded judgment depend partly on the 
movements of the hand, so that the former factors are likely to have played some 
part as well as the latter. The successive recording motions of the hand may have 
been correlated as well as the variations in mental estimate. 

The importance of the results of course depends on how far they may be con- 
sidered as typical of any practical series of observations made by the astronomer or 
the physicist. Experiments were admittedly chosen in which it was expected that 
the variations in judgment would be large, and for the experienced observer working 
at the type of observation in which he has had much practice, the errors would no 
doubt be smaller, but it seems to me likely that the phenomena which have been 
discussed will be present in the judgments of other observers even if on a smaller 
scale. Experience and accuracy may be gained by practice, but it does not follow 
that the correlation between successive judgments will disappear. The secular and 
sessional changes may be small, but if rough comparisons of only the yearly mean 
personal equations of different observers are made, the finer changes, which may 
be of considerable importance in a combination of observations, cannot be recognized. 
The Law of Normal Errors requires but two constants to describe adequately any 
series of observations : : 

(1) the mean, 

(2) the standard-deviation, 
while the introduction of a third may be necessary if a gradual secular change in 
personal equation is noticed. But the more generalized Theory of Errors discussed 
in the preceding sections requires more detailed information and a greater number 
of constants to define the character of an observer's personal equation and variations 
in judgment. We shall require to know how the personal equation and the standard 








102 On the Variations in Personal Equation 


deviation vary both within a session and over long periods of time, and if there is 
any correlation between successive judgments, what is the form of the function yw, 
which gives the value of the successive correlation coefficients in the relation 


R, =  (k). 

It is only by a detailed analysis of the observations themselves or of others 
carried out ad hoc, copying them as closely as possible, that full information on 
these points can be obtained ; but if the possible complexities which may be present 
in the variations of judgment are fully realised, a great deal may be done in prac- 
tical cases by the arrangement of the observations and the combination of the 
results, to eliminate the factors whose magnitude is unknown and to correct. for 
others which are more easy to ascertain. 


I have heartily to thank Miss I. McLearn for making the diagrams for Figs. 3, 
4, 8, 12, 17, 19 and 20, and Miss M. Noel Karn for assistance in some of the 
computation. 





cma ae a 


— 

















Biometrika, Vol. XIV, Part I Plate I 


Warren, /uherttance in Foxglove 


Il A 





Figs. I—IX. Pelorism of various intensities. Fig. X. Split corolla. 














INHERITANCE IN THE FOXGLOVE, AND THE RESULT 
OF SELECTIVE BREEDING. 


By ERNEST WARREN, D.Sc. Lond. 





In Biometrika, Vol. Xt. pp. 8302—327, 1917, the author published a preliminary 
report on the earlier results obtained in the breeding of foxgloves ; and the present 
paper contains some account of the final results of the selection experiments. 

In 1914 ten foxglove plants (Digitalis gloainiaeflora), obtained from various 
sources and of different characteristics, were crossed among themselves and also 
self-fertilised. In subsequent years, 1915—19, new generations were obtained 
chiefly by the self-fertilisation of selected parents. The measurement, or when 
not possible the grading, of certain characters (pelorism, colour, size of flower, 
spotting of flower, etc.) was undertaken in all the generations in order to deter- 
mine the effect of selection when selfing alone occurred in an apparently pure 
race, 

1. PELORISM. 

Mendelian inheritance occurred in a typical fashion, A peloric plant crossed 
with a non-peloric plant produced non-peloric offspring. On selfing these, or 
crossing them together, there resulted on the average one peloric to three non- 
pelorics. 

Of the 10 parent plants two exhibited the peloric condition in a fully developed 
form, and the rest were non-peloric. The character was very perfectly recessive, 
and by breeding, it was found that three of the remaining plants were really 
heterozygous, while all the others were non-peloric and homozygous. 

It was soon observed that the peloric condition was by no means a clearly 
defined and fixed character. Pelorism in the foxglove may be regarded as an 
abnormal lack of power to produce internodes between the flower-buds, and con- 
sequently there may result considerable fusion of such buds with one another. 

The maximum stage of pelorism is seen when the main-axis is short and 
abruptly ceases to grow in height. Only two or three normal flowers may be 
produced by the axis, and its blunt, sharply truncated end is surrounded by a 
whorl of bracts or sepals, petals being absent. Sometimes a ring of sessile anthers 
occurs (PI. I, figs. 1, 11). 

In typical pelorism the inability to produce internodes affects the terminal 
portions of all of the flower-axes of a plant, both central and side-axes. A variable 
number of flower-buds fuse and the corollas unite and may form a large sym- 
metrical cup or saucer of some ornamental value, but the sepals mostly remain 








104 Inheritance in the Foxglove 


separate (figs. 111, 1v). When numerous flower-buds fuse a dense rosette may be 
formed by the petals, and the result is not pleasing. The peloric or crown-flower 
opens early, often before any of the normal flowers. After the crown-flower has 
faded, the main-axis usually grows through the centre of it, and may even produce 
a second crown-flower (fig. VI); but in the case of the side-shoots the axis generally 
ends in an ovary and no further growth occurs (fig. V). 

If the peloric tendency is not so well-marked, the main-axis may be only 
slightly affected by the suppression of several internodes, and by the partial 
fusion of flower-buds, at a variable distance above the lowest normal flower of the 
axis. Sometimes a considerable number of internodes may be unduly shortened, 
so as to produce excessive crowding of flowers which do not actually fuse (fig. vit), 
and frequently a strongly marked spiral bending of the axis occurs (fig. VIII). 

At other times the suppression of the internodes may occur only high up on 
the flowering axis close to where it normally ceases to grow (fig. IX). 

When the central axis is strongly peloric the side-axes are invariably so, and in 
all other cases the side-axes exhibit greater pelorism than the main-axis. 

Finally, the main-axis may be quite normal and show no peloric tendency, but 
the side-axes may still be strongly peloric. 

The last trace of pelorism in a plant is shown when only one or two of the 
weaker side-axes exhibit some slight sign of a peloric tendency. 

It is unfortunate that it has not been found possible to devise any practical 
method of measuring the intensity of pelorism, and therefore the plants have been 
arranged in four grades. 

0° grade =no peloric tendency. 
1°— 25° grade = those in which the central axis is non-peloric, but the side- 
axes exhibit some peloric tendency. 

26°— 50° grade = main-axis non-peloric, but side-axes may reach full pelorism. 

51°— 75° grade = main-axis partially peloric, side-axes fully so. 

76°—100° grade = plants ranging to complete pelorism in all axes. 

In the generations produced from 1914—19 there were in all 128 fertilisa- 
tions of different classes of individuals, recessive (peloric), homozygous dominant 
(non-peloric) and heterozygous dominant (non-peloric) plants, and families were 
raised. In the table on p. 105 the experimental and theoretical results are 
compared. The fertilisations of the classes DD x DD, RR x RR, and DR x DR 
include both selfing and crossing. The sum totals of the experimental and 
theoretical results are remarkably close; being, crowned, 1019 experimental and 
1013 theoretical ; non-crowned, 1169 experimental and 1175 theoretical. 

It must be noted here that a plant was recorded as “ peloric” 
it exhibited the least tendency towards pelorism in any of the axes. Taking all 
the classes or groups together it may be said that the inheritance of the quality of 
pelorism is typically Mendelian. The group RR x RK should include no non- 
crowned offspring, and the 7 which occurred were obtained by gradual selection. 


or “ crowned ” if 


oe 








—— 








ErRnEst WARREN 105 


The group in which the experimental result diverged the most widely from the 
theoretical result was DR x RR (heterozygous plants crossed with recessives) and 
it would be interesting to know whether such is generally the case in Mendelian 
inheritance. 


























| . ; Number of Crowned Number of Non-Crowned 

| Gametic Nature | Number | Number Offspring Offspring 

of of of ee eee 

| Pairings | Families | Offspring|—_ si | 

| _— Experimental| Theoretical | Experimental| Theoretical | 

Saiaeinaase! Gemaiserteecth: SURINAME. sortie nates 

| DDxDD | 16 266 7 o | 266 266 | 
RRxRkR | 43 741 734 741 = o | 
DRxDR | 38 777 187 194 590 «| «= l83i| 

| DRx DD 5 93 0 0 93 | 93 

| DRx RR | 12 156 98 78 58 78 

| DDxRR 14 155 0 0 155 155 

| | 

ae Nay Ta egomee wae tke ae = 

| Totals | 128 2188 | 1019 1013 1169 | 1175 











The Inheritance of the Degree or Intensity of Pelorism. 

If a peloric plant be crossed with a non-peloric homozygous dominant, the 
offspring are heterozygous and non-peloric, and if these are self-fertilised or crossed 
together the peloric character re-appears in an apparently unchanged and un- 
diluted condition. If, on the other hand, a strongly peloric plant is crossed with 
a weakly peloric one the offspring are more or less intermediate, and if the 
offspring are selfed or fertilised together the intermediate nature of the peloric 
character tends to be retained. 

In the accompanying table A, B, C, D, E are plants of various gametic con- 
stitution. On selfing (A) the offspring were all fully peloric. On selfing some 
5 offspring, A, 2—9, the plants produced were all essentially fully crowned. 

On crossing two recessive plants (A and £) of different peloric intensities (see 
bottom of table) the offspring tended to be intermediate. 

On crossing (A) with an ordinary plant (3B) the offspring were non-peloric and 
heterozygous. On selfing two of these plants, (A x B) pls. 2 and 7, the offspring 
were either fully peloric, or non-peloric (heterozygous and homozygous). On 
selfing two recessives, (A x B) 2, pls. 8 and 9, obtained from (A x B) pl. 2, the 
offspring were all nearly completely peloric. Thus, there was no clearly marked 
dilution or apparent contamination by crossing a peloric plant with a non-peloric 
one. When, however, the same recessive plant (A) was crossed with a hetero- 
zygous plant (C) having in its gametes a weak peloric tendency of about 35° 
there was much variation in the offspring, and on selfing some of these plants, 
(A x C) 1, 2, 7, 11, and raising a new generation it was obvious that considerable 
dilution of the peloric tendency had occurred. On crossing the same plant (A) 
with a heterozygous plant (D) having a stronger peloric tendency (75°) in its 
gametes it was clear that in the next generation raised (A x D) 6, 5, 11 less 
dilution had taken place than in the former case. 








106 


Parentage 


A (100° pelorism) 
Selfed=2Ax RR 


A Q (100° sabeatendd 
Bg (0° ichiitiinis and 


homozygous) 


Ax B=RhRx DD 


Inheritance in the Foxglove 


Pelorisn 


i— Various Pairings. 








C China 
Selfed=DR x DR 


A @ (100° Seen | 


C ¢ (non- decides 
heterozyg ZOU with, | 
say, 35° pelorism in | 
gamete) | 
AxC=RRx DR 


dD catia sous selfed | 


A 2 (100° pooristn) | 


Dg (non- canal and 
heter ozygous Ww ith, 
say, 75° pelorismn i in | 

qn) 


A @ (100°) x # g (50°) | 





| 2 
Peloric Offspring | 6 | Peloric Offspring :- 
o | = 
me ae ——_! § Offspring (selfed) —_—_—,— & 
100°, "75° | 50°| 25°) 8 /100°| 75° | 50°| 25°) 5 
oes ORE es a eee eerie fad is at WEE. Ba 
33 0) 0) 0) OF Apl2(100°pelorism) | 13) 1/0) 0] 0 
ee ae A pl.3 ji | 3} 0; 0]0] o 
| | A pl. 4 a iS Ot) oO 0 
| A pl. 6 nd | 6/0] 0] 0 0 
A pl. 9 Z | 3} 010] 6] © 
| (0G Ga Same Perea EERE aren 4 nares | —-'— 
| (A x B) pl. 2 (non-peloric and | 6/0 |.0 | 0 | 2 
0; O 0 O | 13] heterozygous) | | 
| | (A x B) pl. 7 (non-peloric and | | 510] 0] 0] 2 
| heterozygous) | 
| (A x B) 2 pl. 8 (100° pelorism) | 12 | 1 | 0 | 0 0 
| | | (Ax B) 2 pl. 9 is | ds 0|;0}]0] 0 
| [Ee nie RE | |e 
sos inal SN UNA. SMNIRIORIOEET Vda, ax ca 
} | | } 
1} O| 2] 3 | 13 | = 
| | | | 
ee ae ae 
| (A x C) pl. 1 (75° pelorism) | 20) 4 | 9 | 2 | 0} 
4) 33 3 | 1 7 (A x C) pl. 2 (50° pelorism) | 6 | 7 SESE... | 
(A x C) pl. 7 (heterozygous) -|- 2 | 0 | 10 
| | (Ax C) pl. 11 ii l— Sod ow 
5 oe ee 
| | | | 
te ee | ae 8 
j ] | j 
Seek ie oe | | = 
_ | Sa 1 — — } ' — 
| (A x D) pl. 6 (100° pelorism) | 17 | e216 | 0 
i} SF] OO} oO | (Ax D) pl. 5 Se} Oo.) EO | 0 
(A x D) pl. 1 (75° pelorism) 27 | 5 : 1 e412 
| | | 
| | 
4} 1| 3] 0| 0 | | a 



































In the last generation it will be seen that there was no sharp separation of the 
plants into two groups attributable to the two grandparental factors. 
the case of (A x C) pl. 2 (50°) the offspring are not clearly divisible into those of 
100° resembling A, and those of 35° attributable to C; in other words there was 
no obvious segregation into two degrees of pelorism. 


Thus, 


in 


On the factorial and chromosome hypotheses we must suppose that the factor 
or factors governing the peloric character tend to become mutually changed and 
intermediate in nature when the male and female chromosomes containing the 
factors for the two degrees of pelorism lie alongside each other in the zygote. 


It will be of interest to obtain a general measure of the strength of inheritance 
between mid-parent and offspring with respect to the transmission of the degree 
or intensity of pelorism. 


For this purpose only recessives were used, involving 


OE ce Oe 





ee 


— 


rE 








me eee ee 





ts 





ErNEst WARREN 107 
30 mid-parents. Employing Prof. Karl Pearson’s method the accompanying table 
gives the correlation surface. 
Pelorism—Correlation Table—Recessives. Mid-purent and Offspring. 
Offspring. Grade of Pelorism. 























2 2 ws “ 
Mid-parents. S ° S ° 
Grade of L | Totals 
Pelorism 3 a 2 Si 
| | 
— 95° 6 | 23 sO 18 49 
26°— 50° 58 | BI 68 | 11 198 
51°— 75° 64 31 15 — 110 
76°—100° 143 14 i 5 173 
| 
| Totals 271 | 108 es ial es 530 





The coefficient of correlation, calculated from the table, between mid-parent 
and offspring is ‘52. The result can be regarded as only a very rough approxi- 
mation, since a satisfactory method of measuring pelorism has yet to be found. 
The figure obtained is somewhat low, but it would seem to indicate that the in- 
heritance of the degree of pelorism is of the nature of ordinary blended inheritance. 

The point of interest to notice is that the union of two peloric plants of 
different peloric intensities influences the gametes, while the union of a peloric 
plant with a homozygous non-peloric plant does not very readily affect the purity 
of the gametes with respect to pelorism. 

Pelorism. Effect of Selection in a homogeneous race. 

A peloric plant (C) with pelorism of about 85° intensity was self-fertilised, and 
the offspring, 16 in number, were as follows: 7 with 100°, 4 with 75° and 5 
with 50° of pelorism. 





























< 
| | Crowned Offspring | 2 Crowned Offspring | % 
Parentage | 5 Parentage Sern s ie 
| (Self-fertilisation) | | ] S) (Self-fertilisation) | Ss) 
| |100 | 75° | 50° | 25° | 3 100°! 75° | 50° | 25° | 2 
| ihe! BAS ass = 
| | | | ! 
C (85°) | 7 4)/5|0/)0 | 
| | ant hee | C 2, 11 (75°) C 4S So 1a 
ie | 
Tr ee a | | 
| | é3 (50°) | +e hes | 10}; 0; 0 C 2, 2 (50°) | 2116110; 0 |} © | 
| | | | ae | 
| L —_|—__-__|—--L 2, 8 (50°) 1 2D | AS ae 1 Ol dO 
| Le7 OO)... a.) DE 8 Se fe 
| } | | | 
| eT | | 
| ©7,10(25°) ...| O| 2/18] 2) 5 oe 
| | | | eae 
| ye ae g0080" eee —|— = bs | | 
| , 10, 20(25°) | O oO} 1 2 7 
ee | | |; Lt 07, 10, 20, 4 (0°) | 0 | 0| 0 | 0 | 6 
| | | | | } 











108 Inheritance in the Foxglove 


Two of these plants of 50° (C 2 and 7) were selfed, and the generation raised 
exhibited a lowered pelorism. The various selections made and the results 
obtained are shown in the accompanying table. It will be scen that finally on 
the selfing of plant C 7, 10, 20, 4 (0°) only non-peloric offspring were obtained. 


2. GENERAL COLORATION OF THE COROLLA. 


As described in the previous report (loc. cit.) the intensity of the purple 
coloration was measured by comparing it with a colour-scale founded on the 
intensity of colour by transmitted light of varying depths of a standard colour- 
solution. 

Purple and white foxgloves exhibit the ordinary Mendelian relationship, purple 
being dominant. A confusing aspect of the problem is introduced by the fact 
that “white” foxgloves are not necessarily entirely white, since they may exhibit 
a faint purple coloration which on the colour-scale adopted may amount. to 
about 5. On crossing such a plant with an ordinary purple plant segregation 
occurs when the heterozygous offspring are self-fertilised. Any higher coloration, 
say 10—15, does not exhibit segregation, but gives a blended inheritance, and 
such a plant is to be regarded as a very pale purple one and not “ white.” From 
certain observations that have been made it is probable that a similar condition 
occurs in the Blue Agapanthus lily, since some of the “ white” plants have flowers 
faintly tinged with blue. It is quite likely that the phenomenon is general, and it 
may throw an important light on the physical theory of heredity. Possibly it 
may be surmised that a factor for a coloration of less than 5 units is unable 
to blend with, or influence, the factor controlling a higher coloration, in that we 
have reached the lowest dynamic unit. 

Of the ten original plants, five were purple and homozygous, four were purple 
and heterozygous and one was white or recessive. These were very variously 
crossed in all manner of ways. In the accompanying table the experimental 
results are compared with the Mendelian expectation for the different gametic 
pairings. 

General Coloration of Corolla—Breeding Results. 




















Gametic Nature | Number | Number White | Purple 
of of of | one Pay ee a ee se 
Potrings | Domtiies | ageing Experimental) Expectation Experimental | Expectation 
rie eh eleite | eedtl ‘sae ———| 
DDx DD | 120 | 1620 243 | o | 1615 1620 | 
RRx RR 17 336 330 336 6 0 | 
DRx DR 50 785 190 196 595 589 | 
DRxDD | Ul 103 0 | 0 103 103 
DRxRR | 8 76 24 38 | 52 38 
DDxRR | 8 87 0 0 | 87 87 | 
| 
ee ee ee ee | SS See ee me) 
Totals | 214 3007 549 570 2458 | 2437 | 














Ee ———— 














Oe - S 





ee ed 





os ete ore = 


Yo I re 


a EE 





ERNEST WARREN 109 


In the gametic group DD x DD (homozygous purple x homozygous purple) 
there were 1620 offspring. These should have been all purple, but there were two 
white plants which occurred in two deeply coloured families and three white 
plants which occurred in one pale-coloured family. I do not believe that there 
was contamination, and it is probable that the two former plants were sports, 
while the three latter plants were produced by selection. 


In the group RR x RR (white x white) there were 336 offspring, and these 
should have been all white, but there were six pale-coloured plants. The difficulty 
in distinguishing a tinged “ white” plant from a pale-coloured plant may account 
for this result, but I favour the view that we are here witnessing the beginning of 
a coloured race. 


The result given by DR x DR (heterozygous purple x heterozygous purple) is 
very closely Mendelian. Out of 785 offspring there were 190 white plants while 
the expectation was 196. 

Heterozygous plants crossed with dominants (DR x DD) gave nothing but 
coloured plants, and this was also the case with dominants crossed with recessives 


(DD x RR). 


The gametic group DR x RR (heterozygous plants x recessives) gave a result 
which diverged rather widely from the expectation: there were insufficient whites, 
there being 24 whites and 52 purples instead of 38 of each. The numbers are 
somewhat small for drawing conclusions, but it is important to notice that in the 
character of pelorism it was the same gametice group which diverged the most 
widely of all the classes from the theoretical expectation. On the chromosome 
hypothesis it may be conjectured that possibly preferential pairing of the male 
and female chromosomes may explain the discrepancy. 


Tie Inheritance of the Intensity of Coloration. 


On crossing a purple homozygous plant with a white plant the offspring were 
all heterozygous and all coloured, but the intensity of the coloration was mostly 
reduced very considerably. On selfing these offspring the next generation yielded 
some homozygous dominants in which the original colour-intensity of the grand- 
parent was regained; thus, at first sight it appeared that there had been no real 
dilution of the colour by crossing with the white. This was my first impression 
from the earlier results, but with more extended experience I found that there 
was certain evidence that the crossing with the white did have some deleterious 
action on the intensity of the coloration of the dominant grandchildren, although 
the coloration which appeared was much greater than a half and half blend with 
white. 


If two homozygous dominants of marked difference in colour-intensity were 
crossed, the offspring tended to be intermediate. On selfing these offspring the 
next generation was similarly intermediate, and there was no segregation into the 
two different intensities of the grandparents. Thus a true blend of the two 
intensities had taken place. 





110 Inheritance in the Foxglove 


In the accompanying table the results of some instructive crossings and self- 
fertilisations are given. In Series I two dominants (# and F) of different colour- 
intensities were selfed and the families raised showed that the parents were homo- 
zygous. On crossing (/) and (F’) a family of intermediate offspring was obtained. 


Colour. I nten sity Various Pairi rings. 















































| Coleas: ‘Seale—Oftspring | 
“ Calonted ned |. ‘* Waiter ” 
No. of | Aametic Mid- |—7— ae “ | -| Mean of 
rag Con- Parentage Parental}! = | = | co] m1 al] al wl] w ; | Coloured 
stitution | Colour al SO), a&/ So! &/ ola] 6] om ° |Offspring| 
| CEP CRM ent ete ee eam 
fed 2/8) Sle) elalele)s| » 
| | 1% 3 2 a 
— = | 
I DD DD | Feat’... ws ee | «90 F—1 8] 0] 8) Bi] —|—j= | — | a 
|DDxDD\F ages Sy pee we 68 —_— | seat at aa 6 B | — | — | —| a | 64 
|DDxDD | QEx oF oo 79 a eee cree I ae 5}—|—} — as | 7] 
| DDx DD | (Ex F) pl. 16 (selfed) ase |,. Com —|—]| 2 6; 2;—;{/—;]—)] — = | 87 
DD x DD | (Ex Py - 9 ee, ee 61 —{|—{|;—;—| 1/10} 1}—|]— me | 56 
= =e eee 4 = ee =e 
i | DDx RR | 9 Ex g (Warr) Le el ee art ot OR 2) a a 
DRxDR wnt ae 18 (selfed) 71 —|;—! 1 l | a) mig 1 | —|} 2 | 66 | 
| | | | | 
2 | aps an | 7, 
lil | DDx DD B (selfed) ao. 3/o0;2) 3] 6] 1;—/-—/—}| — | 102 
DDxRR | 2Bx go (Waite) ee -}—|—| 1] 4) 4;/1}1]/-| — | 69 
DRx DR - x Waite) pl. 1 (selfed)| 80 |—|—|— 1} 2;—j]—|— —. _ a 
DRx DER | (Bx Waite) pl. 5 (selfed) | 32 a Sant add “peal Gas Seek 3 |— 5 | 3] 
- _— —_ '--—— ——|-— Se: Se LE eB 5 
IV | DDx DD | B(s selfed) .. “a: ar 3/0) 2/ 3/6) 1}—|—|—| ~— | 102 
DRxDR A (slfed). oy oe A oe | hee) SANT eT Beets, 7° | 
DDxDR 9 Bxeé A 3 82 2,;O;1) 4) 4) 2);—|—)]— = | 105 
DDx DD (Bx A) pl. 7 (selfed) ...| 130 tr] 1 | 3 /12}13/—/—|—}]— a 87 
DDx DD | (Bx A) pl. 2 (selfed) ...| 65 —- i —- — 2/13) 12 | — = = 67 
V | DDxDD | B (sete) = oo ath 95 = tah 0 se a a SS Sag ta 102 
| DRx DR | C (selfed)... = ax 34 add Sean Gene) (yee Meet Gal te. 2\/— 4 40 
| DDx DR | |9Bx gC ean ne 65 —f—/—| =| 2] 241 ofa] = 53 
| DDx DD | (Bx) pl. 8 (selfed) ...) 50 See pag ee ty ely eo les ig eee, = 50 
| DDx DD | pl. 4 | 50 ee eg a ee ee 2s | eee = 59 
| DDx DD pl. 7 | 58 —|—|—|—/ 5] 4/1 eu _ 60 
DDx PD | pl. 6 | 68 —|—|}—|—| 6]4);9}]—|— et 54 
DDx DD pl. 1 | 70 —-{|—| 2| 4 | 20; 4| — | — = 74 
| 





Two of these offspring were selected, (H x F) pls. 16 and 9, as widely divergent 
from each other as possible, and selfed. In the families obtained there was no 
tendency for the occurrence of segregation into the two colour-intensities of (#) 
and (F) respectively. There was thus a definite blend, and the means of the two 
families approached the respective colour-intensities of the two self-fertilised 
plants. 

In Series II the same homozygous dominant plant (/), with colour-intensity of 
90°, was crossed with a white plant and all the offspring were heterozygous and 
intermediate. On selfing one of the darker coloured offspring, no. 18, the dominant 
plants raised tended to be of about the same colour-intensity as the grandparent 














Mean of 
Coloured 
Offspring) 











ERNEST WARREN 111 


(Z). In Series III a dark-coloured homozygous dominant plant (B) was also 
crossed with a white plant. One of the darkest heterozygous offspring (B x White) 
pl. 1 was selfed and the coloured plants raised tended to be paler than the grand- 
parent, but the family was small. 

In Series IV the dark-coloured homozygous plant (B) was crossed with a dark 
heterozygous plant (A). From the offspring raised, two were selected and selfed, 
one very dark and the other moderately dark. The two families included only 
coloured plants, and consequently the parents may be supposed to have been 
homozygous. The moderately dark parent (Bx A) pl. 2 failed to produce any 
offspring as dark as the grandparent (B). 

In Series V the same plant (B) was crossed with a light heterozygous plant 
(C). From the offspring produced five homozygous dominants were selfed, and in 
the five families raised only two plants reached the colour-intensity of the grand- 
parent (B). 

On taking all these results together it may be said that there is evidence for 
the view that crossing a dark race of foxgloves with white plants tends to dull the 
colour-intensity of homozygous dominants of subsequent generations. 


General Coloration—Strength of Inheritance and Effect of Selection. 


In 1914 a dark-coloured homozygous plant (B, 2) was crossed with a somewhat 
pale-coloured heterozygous plant (C, #)=DD x DR=TIII. The offspring would 
consist theoretically of approximately equal numbers of dominants and hetero- 
zygous individuals. The reciprocal cross (C, 2 x B,¥) was also made.=II. Several 
dominants were selfed and families were raised. Otit of these families certain 
plants were selected and selfed and new families were obtained. This procedure 
was continued until 1917, and the results are given in the accompanying table. 
The families of the different years are arranged in ascending order of the colour- 
intensities of the parents. On comparing the means of the families with the 
colour-grade of the parents (shown in brackets) it will be at once seen that small 
variations in the colour-intensity of the parents tended to be transmitted to the 
offspring. It is obvious that the table exhibits the effect of selection in self- 
fertilised homozygous generations. 


For example we may take the following: 


Homozygous plant, IT. 1 had a colour of 70 and a mean of offspring 74 
An offspring of above, II. 1, 4 fe < a oS = a 82 
An offspring of above, II. 1, 4, 17 ‘ “ me ,, es a 95 
Homozygous plant, ‘ITI. 2 mn a 4 » ” 66 
An offspring of above, ITT. 2, 1 ss - 66, = a4 55 
An offspring of above, III. 2, 1, 18 “4 os Ws eee is 85 
An offspring of above, ITI. 2, 1, 18, 28 _,, * SBF is i » . 20 
Reverse selection is shown also : = 

Homozygous plant, IIT. 2 < % 1 6 ‘i * 66 
An offspring of above, ITI. 2, 5 = = BF -y ne rm 57 
An offspring of above, IIT. 2, 5, 5 = a 40 » se a 41 


An offspring of above, ILL. 2, 5, 5, 12 * a 20 - 9 32 








112 Inheritance in the Foxglove 


Inheritance of Colour-Intensity among Dominants. 


































































































DR | 
x Dominant Generations (Self-fertilisation) 
DD 
<7 — Ta 
} | | 

Grades of ~ 1 eis _ = = a aoe > — = <q iS Ye) eee — 
Colour- “ aeigisl2|2iF/Zlée SiSiSif£is 5 Stele tate 
Scale (| 2PR LS (Lie lwelolalaialaoinlalalalceaierl el sisi | 

(Offspring) | os be | mn be fir l sol sl ele ont ae tated So nel = | - ff 

ee ee i pet A ges en | os Le : 

S “TRESS IRS /a je ls ls|eaysetelelelalc 

ll | | _ _ | n= 
=_ | b ee oes F ee Tb ie es il 
30— 39; 2) — | 1}/—}] 4;—/—|—|-~]-—!—|-/-|-|- —|—|-—|-|- 
40— 49 | 0 3 | 8|/—]i65}/—| 2);—|—|— —}|—| —|—|—J—|—}—|—]—]|- 
50— 59 | 4 8 | 13 | cl rime Sb O12) a) tel et ee oe FS 
60-— 69 | 1 12 | 4| 9}—| 7|—| 7| 7] 7] 6] 3] 4] 2] 2] 4] 3] 3 <= | a 
70—- 79 | 1 Sar: Se eee ie eee aoe aS rie o}—|—| 
so— 89; —f — |—| 1J—-}/—}—|}/—;—}|—|—|—) 6] 3) 1f—|—] 2] 6] 1)/-| 
9—99|/—f — |—/] 47/—|—/—/|—|—|—|—|]—| 3] 4|-—I-—|—|—|—] o|— 
100—109 | — EE Sea Tes; ticadl Goth Saad Gadd eas bona Seek feed BN es ga Be Set he 
110-119; —} — |}; —|—}F-|—|—|—}|—!|—) -}|—} 2] ©} —fF-—J]—]-]-]-]-] 
120—129 | — salle fosed fine: ‘adh ta aad ed fend) —|-|-| 1) —J—|}—|—|—|--}—-| 

a _|—— tT — Ro 
Means | 53] 59 | 54 | 74 51 | 66 | 54 | 62 | 66 | 69 | 64 | 64 | 82 78 | 70 59 | 65 | 71 | 81 | 95 | — 
| | | 
_ | ~~ age ols iS oa oil 
é lolelelale Sig lflel@/ =e le) 2/2] e/8| 
ee ~Ialalseli/eliaHr(/ata ris irill., m~ | © | ov | oa | = | com 
Grades of | x } FO eje;/L lc Clot Tle al/o|ie« = “5 oe Boe ee Daoke ho 
cr Seales lalslarsiels(e/a1e{ ei s/f] se) 2] 2/4 
(Offspring) | ste = = me feed | a he ae aw | | wv) of | of 1 is hee tae = | sn | ai 
1 SSS SSeS ees lsislsisisisiasitis 
_ | sae ead ‘Bie. — _ =" me | - eB 
5 Pics FIRE SIRS ale 
ern elie, : _ iy iS em ‘ome piae 
20— 29 | — — F-—)-—}—-}—)-}—-y-—) etd ay lilt} -l-}- 
30— 39] 1f — J— —}|—|—|—|—] 4] 2)-/ 1 7} #| #] 3} -|—|-—|=] 
4o— 49| Of — 3; 3|—| 5|—|—] 7] 8;—| o/—J—] 4] 5] 6|/—|—|—|-| 
50— 59 | 1 44 9| 6/ 3] 5 1J—| 5) 5/ 2] 1J}—] 1] 4) 2) 1] —-}/—]}-|] 
co—69| of 17 | 3| 2] 1! 2] 6|/u}—|—| 4/2] 1]—|—|—|—| 0] 1} 3/—| 
70— 79| 2 97 2)/—/—| O; 11}; 9F—|—] 1] 4] 27—};—|—|—] oO} 1] 0] 1 
80— 89 | 1 1 2g toe bos 1) 1} oJ— —}|—| 2] 37—)}—|—;—} 2] 1] 1] 4) 
90— 99 | — —|—|-—|-!-—} 1J—|/-|;-—|-] 5}-—]|-|-|-|- =i s 9 | 
100—109 | — — Fe rll dl ol lol 2 llc il] 8 
110-119) —} — J—|/—|—|—|'—|—J—|—-|-—|-|-}-|-|-|-;-|—-| 1] 8] 
120—129| —} — J—|}—|—)—! —}—J-—!|—| —}/—}—J-|-—|-—|—-|-—!—] 0} 4] 
0-19 |—F — =|=}=|=)=]=]=)-|=|=]=PEl=|=l=]=]H} js 
‘ — ————E —= — ae —— ——| 
| | | | | 

Means | 67] 66 ] 57 54 | GO| 55 | 71 | 69 41.| 47 | 62 | 68 | 85 sz | 43 | 47 | 44 | 74 | 75 | 100| 

















Thus, starting with a plant of about 70 colour-intensity we arrive by selection 
of self-fertilised plants at mean family intensities of 100 in one direction and 32 in 
the reverse direction. 

In another series, starting with a homozygous dominant plant of colour- 
intensity of about 11, I have by selection obtained plants in which the corolla 
exhibited no general tint. On selfing the pale plant no white plants occurred, 














=| | 
<3 | 
~~ 
be 
a 
= 
1|—| 
Oo|;—] 
9 |e 
5 | — 
aaa 
> |e} 
rn | O 
«| @ | 
a | Oa 
S| ot] 
- |e} 
T | ot | 
wee 
+ | om | 
=“ | 
= 
| 
3) —|} 
OD} Bi 
1 4 | 
2 | oy 
2} 4} 
l 8 | 
0| 4] 
| 
a 
1 | 100} 











ERNEST WARREN 113 


and the offspring were all pale-coloured ; but when the intensity was decreased by 
selection to about 4, the “white” plants showed Mendelian segregation, for the 
offspring arising from the plants produced from a cross with a dark-coloured 
plant were sharply divisible into strongly coloured and “ white” individuals. 

As a further example of selection, I started with a homozygous medium- 
coloured (48) plant (@). This was selfed and a family of 31 coloured plants was 
raised, there were no whites. Thus, the parent plant may be regarded as homo- 
zygous. A plant (@ 3) in this family, not far removed in colour (55) from the 
average, was selfed and the resulting family had a mean colour approximating to 
the colour of the parent. A light-coloured (27) plant (G@ 3, 20) and a dark (81) 
plant (@ 3, 13) in this last family were selfed also, and the two families raised 
tended to resemble their respective parents. In a succeeding generation further 
progress was obtained in securing a dark race and a pale race. The necessary 
details are given in the accompanying diagrammatic table. The families printed 
in heavy type are those leading to a dark race, while those in ordinary type are 
passing into a light race. 


Formation of Light and Dark Races from a Dominant 
(homozygous) G. 

















| Parents | Offspring—Scale of Colour 
B a es oS 20 | PAS | < | at | & | ® = 
| Number Colour | | | | | | | | | | | | | 
| | Ss <> | | x D> Mi we! & res 
ee Oe ae * aso. =] = oa > | > oS > | 
| @ (selfed) | oe tat i|9 | wie lai] 
| @ pl. 3 | 5 |—|—|2]1]0] 4] 5/3 1 | 
}@3,p.20  ..| 38 |—|—|—|—|—|/—|4/1]—-] 
| G@ 3, pl. 13 |; 8 |—|2/)8)32 | 2,;2);3;—/|— 
G 3, pl. 20 we | Qf Sl] }ye— je | ee | 4\;61|92:3 
@3,13,plL2..| 8% |@2/@6/a)—|—|—|—|—|— 





Correlation Table—Colour-Intensity—Dominants (homozygous). 


Series IIT and IIT. 



































ee eae ee (ape se = j 2 ida toe ae Sea prey 

| Parents. we | we] | S | claxixilalale| ela Sg 
Grade of T 7 | T | 7 T na T | S| 7 | 7 t FP 

| Colour- ee | peaulie feu Mik dealt eee al Ee | | : eS, 
Intensity | S| 8] 3) 35] 8/) 3) 3) 8) 8/8] 8,8] 4 

| i 

pe) Py ee ak tel ee =\— |] 47 # 3 17 
4o— 49 F—| —| — | —|—/—] 2/ 7! 26] 29}| 10] — 74 

| iE Be | ad oe ee hank a | i 100 
co— 69 }—| —| —| —| —]| 3] 18] 37| 31) 13] 1|—] 103 
vo— 79 J|—|—| 2] 2| 8/19) 54) 50) 9) O| 1) — 145 
so— 89 J—|—|—| 3] 5] 4/ 3| 2] ti}—|—J|— 17 
9—99 }2\5|) 9] 7/15) 8|5| 5) 5|}—-|—|—-] 61 
100—109 — | =a — | — | — | me | ee ef ef 0 
110—119 | — | honk Sa etd og ees oe 3 
120-129 |—| —|—| — | a] 6| 29] ri}—j—j|— 9 

tot Ser | 
Totals 2/56/1113 | 28 | 36 | 91 134) 112 | 69 | 25 | 3 529 
: SS. ae: sa . poe es | 
Biometrika xiv 8 








114 Inheritance in the Foxglove 


In the last table, p. 113, a correlation surface is shown between parents and 
offspring. It is formed from the series of families given in the table preceding the 
last, and arising by self-fertilisation. 


The constants calculated from the table are: standard deviation of weighted 
parents 1°7805 units, and of offspring 1:8962 units, coefficient of correlation ‘707. 


In this table 39 families were involved, as detailed in the previous table. 
The starting points were four homozygous dominant plants occurring in the two 
families raised from the reciprocal crosses (C, x B,) and (B, x C)). 


3. Brown Spots. 


The amount of spotting on the inside of the corolla is not closely correlated to 
the intensity of the general purple coloration of the flower, for even in white 
plants the spots may be numerous and of a deep purple colour. In coloured 
plants the spots were almost always dark purple. As a very rare exception in the 
coloured plants (4 plants in about 2500) some of the spots were russet brown, and 
in the case of the larger spots there was a middle area of brown bordered by a 
margin of purple. In white flowers the spots were fairly frequently brownish- 
green or brown. In such brown spotted white flowers I could never detect the 
slightest tinge of purple on the general surface of the corolla, while in purple- 
spotted white flowers a faint tinge of purple could often be seen. The brown 
spots of white flowers might not become visible until the flowers were on the 
point of fading, and in the case of any given white plant it was wholly impossible 
to affirm that brown spots were, or would be, entirely absent from all of the 
flowers. 


With the exception of the four plants mentioned above there was a sharp dis- 
continuity to the naked eye between purple spots and brown spots, intermediate 
conditions being absent. The brown colouring matter may be regarded as altered 
or decomposed anthocyanin. In purple spots a microscopic examination often 
showed a certain amount of decomposition; but, with the exception of the four 
plants, the amount was not enough to alter the colour of the spots sufficiently for 
detection by the naked eye. Thus, the discontinuity lies between a normal small 
amount of decomposition, and an abnormal entire decomposition. It may be 
stated that under ordinary circumstances brown or greenish spots (as seen by the 
naked eye) are linked to a perfectly white corolla, but purple spots occur in both 
purple and “white” flowers, and an apparently perfectly white corolla may also 
bear purple spots. 

If a brown spotted plant is crossed with a purple spotted one the offspring are 
all purple spotted and heterozygous. The brown spotted condition is inherited in 
Mendelian fashion, and is recessive to purple spots. 

No special crossings have been made to investigate the matter, and the results 
which are given below are merely picked out from the records of the numerous 
families which have been raised for other purposes. 

















ERNEST WARREN 115 


In the accompanying table it is useless to include families in which there was 
no taint of whiteness, since all the individuals (except 4 plants out of 2500) had 
purple spots. 

Brown Spots—Families White or Some Taint of Whiteness. 



































| 
Purple Spotted B 
Gametic Nature | Number | Number | On eT ee 
of | of et | 
Pairings | Families | Offspring | - , ,| Mendelian ‘ Mendelian 
| | icseniaaa Expectation Experimental Expectation 
| fl ie ae. - 
DDxDD | 138 344 344 344 0 0 
RRxRR 11 169 | 0 0 169 169 
DRx DR | 13 213 SO} 166 | 160 47 53 
DRxDD | 15 137 137 137 0 0 
DRxRR 1 8 | 3 4 5 4 
DDx RR 6 70 70 70 0 0 
| Totals 59 941 | 720 715 221 226 








It is obvious from the table that the brown spotted condition exhibits Men- 
delian inheritance. 


4. INHERITANCE OF CERTAIN SPORT ABNORMALITIES. 


Crenate Margin.—In a homogeneous family of 29 plants there appeared one 
plant in which the free edge of the mouth of the flower exhibited a well-marked 
serrated condition. All the flowers of a main-axis of considerable size were 
similarly affected, and later, lateral flowering axes were formed, and the flowers 
were also serrate. The character was sufficiently marked to be noticeable at a 
casua! glance of the plant, and since all the numerous flowers were alike in this 
particular, the character was clearly inherent in the plant, and was not due to a 
chance environmental disturbance influencing a young growing axis or certain 
flower-buds. The plant was self-fertilised, and it was confidently expected that 
the character would reappear in the offspring. Out of a family of some 20 plants 
12 flowered and no sign of the peculiar serrated condition could be detected in 
any one of the plants. Here we have a conspicuous character in a large healthy 
plant affecting every flower of all the flowering axes, and yet apparently it was 
incapable of being transmitted to the offspring. 

Split Corolla.—In a homogeneous family (XX XIV) of 27 plants there appeared 
one plant in which in the great majority of the numerous flowers the corolla was 
symmetrically divided into an upper, a lower and two lateral pieces by four lateral 
splits extending down to the base of the flower. The plant was a large, healthy 
one and produced a number of similar lateral axes. At least 90°/, of the flowers 
were completely split (PI. I, fig. 10). 

In a family (VIII 7) unrelated to the above there were 16 plants, and of 
these, four plants were similarly affected. In one of these plants practically all 


8—2 











116 Inheritance in the Fouglove 


(99 °/,) of the flowers were entirely split into four pieces, while in the remaining 
three plants some 50—60 °/, of the flowers were split. All the plants were large 
and vigorous. It was thought that very probably the character would exhibit 
Mendelian inheritance. The results of crossing and selfing are shown in the 
accompanying table. 

Inheritance of Split Corolla. 










































































2. | 
Ese | 
ss st | Oo | 3 | @ eaioc|;o o}o]°o 
42% ° o|z ae }e2ljo|+*|s 21° 
Eon 
a 
ar) 
- 
3 a= 
> t 
2 S “an 
=| & a : . 
~~ ~ 
a 5 oo s 
AZ = R 
3 = ay 5 = 
® — H 
Fy x 2 ss) 3 ~ pad 
= Sigi*fZlelsi8/3 “lol Zls 
‘> ia et 2 i=) D oO vA =>\rn ® = 2 
¢ yaar 2c Bt be Pear eh Beal Gk ae 
Fax sio|s ae x |= | 9 
6 Si/SISI/SIL(S/Fl-/F/S/s/salelsi- 
~|¢\|~ Ole | eee [ee pees te lol as 
| B& ‘ a = S — _ — n =) i = 
| - | ese | oe Peat eo feo. | oo ake ies ee cee 
| > = | iy = Fy S| = — per a iow! = > aia 
4] | ; : = - R oe |b | oe : : : 
| M1 | wirbieipi|lei[eiRe |e ot cyreireite 
| wale |e lo | Ala l(nl|wa la l|Al|AlSa |e lea] es 
| | | | 
2 oe } Pe | 
oh No Splitting 6 /i2|s8|7]2| 3 1 {0} 0 | 9 | 26| 12 15 | 10 | 10 
a | | | | 
28 1—14 0 ole|3|4 2 | 2/;0)/0/0/0;/o0);0); 1] 
as 15—29 0}/0/3|/0/0}/0}/0]/0}/0/0/;/o0]/0;0];0] 0] 
eee 30—44 O);/O;0;/O0O;/1/1)}/2]}/82/0/0;0;0|;/0)}0)/8 
Be 4I—59 0o|1/]0 EL YL 2 | OLe 1 oO]. @ ft 0) oO Peso |0:] 
= 6O—74 Gf SPO dt | BTS lS set Dp ey Oe i er oe 0 
5 T5—99 rLEC eS te} Ss | 1 1} 3}O/1)0/0)]0/ 04] 


The first mentioned plant (XXXIV 4) with 90°/, of the flowers split was 
crossed with an unrelated plant with some 99 °/, of the flowers split (5th vertical 
column of table). Of the 17 offspring 8 plants were wholly unsplit, while the 
remainder exhibited the character in a very greatly weakened condition. Three 
of these offspring, S. J. nos. 9, 18 and 6 having 0°/,, 13°/, and 18°/, of the 
flowers split respectively, were selfed, and the families raised all contained some 
plants very conspicuously split, but the character was more marked in the two 
families raised from parents 18 and 6 which showed some degree of splitting. In 
a subsequent generation (S. J. 18 pl. 4 and S. J. 18 pl. 10) raised by selfing, the 
character became very strongly pronounced. 


An unrelated non-split plant (II 6, 1) was crossed with the first mentioned 
plant having at least 90°/, of the flowers split (XXXIV 4). In the family of 





a ae 


et SET ee 








i ae 


a ee EK 


Pe SS EE ee 





ERNEST WARREN 117 


12 plants raised none of the plants exhibited splitting. Two of these offspring 
(R. J. nos. 9, 16) were selfed and no splitting occurred in the two families. 
Another generation was raised from R. J. 16, plant 14 and some re-appearance of 
splitting was detected. The table includes all the split plants which have occurred 
among some 3000 plants which have been under observation. 

The results obtained indicate that heredity has some influence, but the data 
are insufficient for determining the nature of the transmission which does not 
bear a Mendelian aspect. 

Creased Upper Lip.—In a certain plant in the majority of the flowers the 
upper surface and lip exhibited a conspicuous pucker or crease. This plant was 
crossed with an unrelated normal plant with no crease. Most of the seedlings 
were killed by the violent elements, but four plants were raised, and in one, 
a number of flowers exhibited a crease, which, however, was much less developed 
than in the paternal parent. The data are scanty, but the hereditary trans- 
mission does not seem to be Mendelian. 

Spontaneous Appearance of White plants—Among the numerous homozygous 
dominant coloured families that have been raised a white plant appeared spon- 
taneously on two occasions in two unrelated families. These plants, of course, 
bred true, and as there was no evidence of contamination of the seed the plants 
must be regarded as new sports. 

5. INHERITANCE OF SEED-LENGTH. 

The mean length of the seed varied considerably in different plants. No 
discontinuous variation could be detected, and inheritance was of the blended 
type. Ten seeds were taken ai random from one or more capsules of a number 
of plants of certain series and the means determined. The seeds of a capsule 
exhibited a moderate amount of variation, but they were monomorphic in varietal 
crossings, and not dimorphic as was noticed in an interspecific crossing. The 
distribution was more or less normal. Unfortunately there was very considerable 
variation in the mean size of the seeds in different capsules of the same plant, 
and consequently no very accurate determination of the strength of inheritance 
was possible with this character without an excessive number of measurements. 
As it was, the investigation entailed the measurement of about 1000 seeds. 

A plant, C, (mean seed-length 639 units), was crossed with B, (mean seed- 
length 628 units) and a family was raised; C, x B,=II. In family II twelve 
plants were selfed, namely II 1, II 2... II 12, the seeds were measured and twelve 
families were obtained. In family II 1 three plants were selfed and the seed- 
length determined, namely (II 1) 1, (II 1) 2 and (II 1) 4. The means of the seed- 
lengths of these three plants were compared with the seed-length of the parent 
II 1. Similarly, for example, in family II 1, 2 two plants were selfed, namely 
(II 1, 2) 5 and (II 1, 2) 20, and the means of the seed-lengths of these two plants 
were compared with the seed-length of the parent II 1,2. The data are given in 
the accompanying table. 





118 Inheritance in the Foxglove 


Mean Seed-length, Parents and Offspring. 




































































| Parent (selfed) | Offspring (selfed) | Parent (selfea) | Offspring (selfed) | Parent (selfed) | Offspring (selfed) 
| 
| [ a aa, l : l 
| desig. | Meat) Desig. | ee | desig. | Mote) desig. | Meat] Desig. | Met) Desig | Meus 
| nation length | nation length nation | length nation “length nation length nation /length 
na —|— aon : — sae pk eee ee wit 
111 | 606 | 111,1 | 572 | In4 | 592] 114,8 | 628 | I19 | 653 | I19,3 | 629 
Il 1,2 | 668 114,12} 598 Jf | ae 
IL1,4 | 649 } |__| — | eg, | ie 
— -|—- 116 | 620 | 116,1 | 621 | T110 | 646 | S a 
11 1,2 | 668 | 111,2,5 | 668 | 1163 | 641 | 7 10, 2 oa 
| II 1,2, 20} 642 | 116,4 | 670 10, 5 49 
Sars eae wees | 116,11 | 695 . 10,7 | pone 
11,4 | 649 |111,4,3 | 655 | s | or eat aoe 
| I 4°17 | 674 | II 6,11 | 695 |116,11,6) 665 Jo eal 
ean econ _ | ee Pee : gaa Sear = 
| ie | 160 | | 649 
Il2 | 528 | 112,1 | 62a | a7 | 547 | 17,1 | 67 ado Raat inchackdead Gees 
| 112,3 | 582 BR eRe  nsgey pene oes preg 
| | II 2, 5 | 687 Bi 7, 14 | 624 II 10, 5 649 | IT 10, 5, 5 | 598 
| 2 . 566 Z | a DB ie. Il 10, 5,10) 629 
| IL 2, 16 | 566 | | 11 10,5,18| 649 
ren area ener me, Ee ke | ee 
113 | 629 | 113,1 | 686 i ee ae: ee eet 
| 13,4 | 686 | as | 620 | 1182 | 6a9 1147 | 660 | 1110, 7,9) 653 
11 3,15 | 672 | 3 | 6294 fT Say SEER 
Et ES Bik tete & | | we] Wu | 65 | 11,8 | 61 
| | — | — ——— 
It4 | 592 | If4,2 | 668 | I19 | 653 | I19,2 | 633 |_| ia — 
| | 114,6 | 657 119,11 | 620 | 112 | 679 1112,9 | 642 
| | 119,10 | 630 


C, (self-pollen) seed-length =639 
B, (self-pollen) Pa =628 
C; (By, pollen) ss = 642, these last seeds produced fam. II. 


The coefficient of correlation, calculated from the above numbers, between 
parents (selfed) and offspring (selfed) is ‘378. This is low for mid-parental corre- 
lation; but as all the generations arose by self-fertilisation we ought to have 
practically no correlation at all according to the ,ure-line hypothesis, for the two 
original parents (C, and B,) were closely similar to each other in the character 
under investigation. 


6. PURPLE SPOTTING OF THE COROLLA. 


The purple spotting of the lower surface of the corolla-tube and lower lip 
varied greatly in the original parent plants, and the character was obviously 
inherited. The amount of spotting had little relationship to the intensity of the 
general coloration of the corolla, and “white” flowers were sometimes richly 
spotted with purple. 

The percentage area of the lower surface covered with spots was estimated by 
comparing the flowers with a series of diagrams each covered with a definitely 
known percentage of spotting. With practice it was found that sufficiently 
uniform results could be obtained by this method. 











2 ae SST 

















ERNEst WARREN 119 


In plants which had lost completely the power of producing any purple 
coloration whatever, the spots were brown and usually small and scanty, and 
among such plants an almost entire absence of spots of any kind occasionally 
occurred. We have already seen that with regard to the colour of the spots 
(brown and purple) Mendelian segregation takes place. 

In the inheritance of the amount of purple spotting no Mendelian relationship 
could be detected. The smallest amount of purple spotting met with in coloured 
foxgloves equalled about 1°/,, and the maximum about 70°/,. It will be re- 
membered that on crossing a dark purple plant with a plant bearing flowers very 
faintly tinged with purple (say, colour 4 of standard), definite segregation into 
“white” and purple plants occurred in the second generation following; but on 
crossing a plant possessing an abundance of purple spots (say, 50°/,) with a plant 
bearing very few purple spots (say, 2°/, or 3°/,) no such segregation was found, 
and the spotting tended to remain intermediate in amount. 

In the numerous crosses that have been made for various purposes the con- 
dition of the spotting was observed, and it is undoubtedly true that the means of 
the spotting of the families resulting from the crosses tended on the average 
to approximate to the spotting of the mid-parent, $(f + $). No difference could 
be detected between the reciprocal crosses of two plants. 


Influence of Selection and Strength of Inheritance in Self-fertilised Generations. 

In this connection details of Series II and III may be given (see p. 120). Plant 
C, with 11 °/, spotting was crossed with pollen of plant B, with spotting 48 °/, = IT. 
Seven of the offspring were selfed and the spotting of the resulting families was 
determined. Subsequently two other generations were raised by selfing. Plant B, 
was crossed with pollen of C,=III. Four of the offspring were selfed and sub- 
sequently three other generations were raised by self-fertilisation. 

The distributions of the spotting in the families of the different generations are 
shown in the accompanying table. In each generation the families are arranged 
in the ascending order of the parental spotting (see the top and middle horizontal 
lines). A casual inspection indicates at once that the general trend of the family- 
distributions follows the gradual increase in the spotting of the parents. 


As an example of selection we may take : 


LII 2 (9°/.) selfed produced with others a plant ITT 2, 5 (15 °/,) 
III 2, 5 (15°/,) selfed - ‘ : III 2, 5, 10 (22°/,) 
III 2, 5, 10 (22°/,) selfed ® ‘ " III 2, 5, 10, 17 (27 °/,) 


5, ’ 
III 2, 5, 10, 17 (27°/.) selfed produced a family with mean spotting of 39 °/, 

Thus, we have passed from a plant with 9°/, spotting to a plant with 27°/,, 
which on selfing produced a family with a mean spotting of 39 °/,. 

With reference to the strength of inheritance two tables are given on p. 121, 
one for parents and offspring, and one for grandparents and grandchildren. The 
respective coefficients of correlation are 560 and 395. This correlation does not 
arise by the mixture of two races which have been sorted out by segregation 








Purple Spotting—-Families from Self-fertilised Parents. Series IT and III. 









































































































































120 Inheritance in the Foauglove 
[» Levin TTS et a eee ea 
| ot | e¢% tI | ee SERRE URE ea 98 meee FU tp ieee. | 8 | 
af avant | Aor LabTATi lia og 18811 SIT eS eonl | | | 
a fost |IImSrTit iti till &] oy ussem| iil iineeeniil a 
| ¢1 Jomo | lImee™itibililt eg “61 (se ‘91 ‘1 2111 | |powromnn | | | | ot] 
| ov | € ‘8 II II | ESE E teat % Sf ost jor‘sr‘t @ 11 Zatti tee 
o: | s* || | [ammeronn || | |g Seer Bisitehhtael 3 | 
| aon (Iti bimmeenii ti 8 uae em PEs yy | ral 
es | som | IIEIM@ILIPIEEII | So | eeoum | 11) pres eren| | | ae 
ce) ann | |1I™@eoom {iti lade | ewom | li limeteen ||) 8 
ef een | 1Im@ei i lili il lal ow | «om PPT eer sr i | 
ve | mem | ITITII IIHT Itt [sy] a | sem | Piles" Tit 8 
zz | 9% [Pp taaeonne | | | S| ot | St ‘L ‘@ IIT | Treee-rt til 1 |] 
w | von [lilieme-=titiii lal a Pieeom |Piii@e--111 11 ial 
| rin |IlI*®rm1I 1111 2 a ee ee Illi | 
o | rom | TLIlli@@@--111\a]o | eeem [avers rl fad] 
61 om |lilmt*eon ji titijalo | ezem | lime itiliiti faiy 
| sam |Ilinmtiitiliiti le |e | eeem | teenilititiilie| 
rm | LL aT b1P™omm TFPI TTI le | sz eour | i imeerrr yi 1) | ay] 
1 | WOR Ler ere Lit ine cz | WLI Loic. 
| rim fiiien*tiitititl (sper | sem |1ieeenee-i1it (sii 
ot | et SEB Pebeidoidt FF Prdd iad <] oat | el II Pibimm sd i di fl 
se | Fil ee ee a) ey finieweri titi [alt 
a | sm jiiieerse=(i litt (aim | em | eee ee 
nu] oem | 1imeermitit itil |e) a | esm |*ee=" 111 1111/2 
o | sn ||| |[seeneccon] | ‘alu | sem hie | | ||2 | 
a] om |tieesstitititni {ef | rem [ienerniiiitit (si 
rl 9 II PpPOVR LLL IIE ae ra Lt (-eereree= | ae 
a) oem [liet=eemnpi yey) (sto | em |lilme*1iilili a 
ee ee | ae ee oe 
enaxtwo=n| 11 1e*-o-1T iti ii) |e] 4 ian A al 
pares () a coors suai | | a 
_Penew(er)'a | TLL Lemme cnn | & Jota | | | 
pas | mem tT LIL tilt | [avexene=ant | bim)ece ri rill [2 
P32 2 |-cnsegcsseguase |e] 2282 2 | --sssanegesss | 4 
e#o°2 (dit id stelle] OG Bsee | hdd ld did. | S| 
El ee ee et eee 























ErnEst WARREN 121 


during the different self-fertilised generations. Inspection of the tables shows 
that the distributions of the various families give no indication whatever of the 
occurrence of segregation into little spotted and much spotted plants. The 
gradual rise in the degree of spotting of the different parents is followed by a 
gradual increase in the spotting of the respective families obtained by self- 
fertilisation. The fact that the correlation between the grandparents and grand- 
children is less than that between the parents and offspring is further evidence 
that the small, apparently fortuitous, variations in spotting occurring among self- 
fertilised generations are inherited. This result is opposed to the pure-line 
hypothesis, according to which such small variations are regarded as slightly 
different expressions of the same identical character which remains unchanged in 
its essence from one self-fertilised generation to another. If such were the case 


Correlation Table—Spotting—Parents and Offspring. Series II and IIT. 
Offspring. Grades of Spotting. 





| 
Parents. 





























ra ers i elelele|s =| =| | | 
S&S | & S > “ | 0 > > Ss w | & ~ | > 
ome PETTITTE EI EI TL LIE] ELE T| PY es 
Spotting ~ | ~ vo | oH Q X | x a | a | ~ rR | & 
0— 3 oS —|—|—|-—|-|- i) #) #12 15 
4-7 J—!—|—|—]-|-—|—|—] 2].7] 1/-]|- 10 
S—ff Bae |) | a | ie | ST 2) ae ee a 68 
12-15 |—|—|—|—]—| 1! 8] 28} 40]e8|19| 5; 2] 171 
16—19 |—| 1 | 1| 5] 14] 10) 28} 34] 33] 49| 20) 1|/—] 196 
o—23 12/3] 5] 4] 91161 25|296|25/17| 3| 1|—] 136 
24— 27 Se 5] 6 3 8 15 | 14 | 10 3 1;—|— 67 
28—31 |—| 1 | 3) Zs] 60 | 9 | 4);10; 2) 3)—/|— 50 
ga—35 F— | — | — | —| —| —| — | — | — | — | — | 0 
36—39 | — | —|— ~|-|-|-|- —}—|—|—-— 0 
50—53 F— | — | — | — | — —j};—|—|— —|—}|—|— 0 
u—e J—l2 |—| 1}—|—|—|-|-|-|-|-|- 3 
| Totals | 2 | 9 | 14] 19| 32 | 45 | 87 108/138/165) 69 | 25! 3 | 716 
! ' i 











Correlution Table—Spotting—Grandparents and Grandchildren. 
Series II and ITI. 


Grandchildren. Grades of Spotting. 
































Grand- | oe | | oe | the | 2 | ne [ares | | | 
parents. | ~ >= > So | & So | tS , > | © cj] > , | 
| tals | 
poor UPN NEMO NP UPSD RENRADS Roo 
Spotting 2 wo Re | or | 2 to | © a ~ | 2 | Lo | 
gar F— | — |) | a | le] eS ae ae | eS 101 
12-15 }1 | 1| 2] 5| 4|17/| 21 | 20] 31} 10] 12 | — 124 
14—19 | 2 | 3| 6|12| 9| 20] 22 | 25| 37) 15 | 2/1 154 | 
o—23 | 2] 4] 1] 1] 4] 7] 9/15] 13) 3) 1] — 60 | 
ae t—| 1] 6] 4) 7] 8) Oo} €)—f— | =] — 30 
28—31 a pte | 7 eas =k Os Kae 38 fs Ye Fea |e 40 | 
| 
Totals | 7 | 12 | 17 | 30 | 33 | 66 | 66 | 91 | 117 | 47 | 20 | 3 509 | 











122 Inheritance in the Foauglove 


the small variations would be fluctuating, non-inheritable variations; but the 
results in the present case are definitely against a supposition of this kind. 

It might be urged by some that the result is really due to the existence of 
genotypes, and that variations within the limits of each genotype are not inherit- 
able. The distributions of the families in the table do not indicate the occurrence 
of genotypes of any considerable magnitude. If the genotypes are supposed to be 


very small the practical result would become indistinguishable from the inherit- 
ance of continuous variations. 


7. Ratio or Breapra to LENGTH or CoroiLa. 

The breadth was measured as the maximum horizontal width across the 
mouth of the corolla of a fully expanded flower in which the anthers had opened ; 
the length was the maximum distance measured along the mid-adcauline surface 
with the lower lip stretched out straight in the long axis of the flower. It is 
, a 1000. The mean of the 

Length 
ratios of the four lowest flowers of an axis was taken as the mean of the plant. 

The original parent plants varied widely in this ratio, and the families raised 

by selfing tended to have the same ratio as their parents. 


convenient to express the ratio in the form 


A plant bearing wide flowers was crossed with one having narrow flowers, and 
the offspring tended to be intermediate. On selfing these offspring the new 
generation exhibited, of course, considerable variation, but taken as a whole the 
intermediate condition was retained, and there was clearly no segregation into 
wide flowers and narrow flowers. Thus, the different degrees of this character 
blend readily on crossing, and the mode of inheritance is very similar to that of 
the spotted condition. 

The results of a multitude of crossings of plants bearing variously shaped flowers 
have been carefully determined and tabulated, and there is no question about the 
general accuracy of the statement made above. In the present place we may 
confine our attention to the self-fertilised generations of Series II and III (p. 123). 

A plant (? C,) with relatively wide flowers (ratio 608) was crossed with a 
plant (f B,) having relatively narrow flowers (ratio 487). The family (= II) had 
flowers approximately intermediate. The reciprocal cross = III. The distributions 
of the families of the various generations raised by selfing are shown in the 
accompanying table. The families of each generation are given in an ascending 
order of the ratios of the parents. As in the case of the character of spotting it 
will be seen that there is a clearly marked tendency for the mean ratios of the 
families to approximate to the ratios of the respective parents. In none of the 
families do we find any definite segregation into plants with wide flowers and 
plants with narrow flowers resembling those of the two progenitors of the series. 

Wide and narrow races could be raised by selection using only self-fertilisation. 
Thus in family III with a mean ratio of 531 there was a single plant (III 2) 
with as high a ratio as 575. This was selfed and the mean ratio of the offspring 





saad 








Series II and III. 


Ratio of Flower—Families from selfed plants. 











ERNEST WARREN 123 

































































































































































































































































ege oom! 1 | P)msony yy] 2 869 8° eu‘ven| | 4 ii ees \2 

| 19¢ sin( 111i lettili eo iseitet| |) 1) i imeem ity 3 

| ee ohm Uh (LISeRo Te ee gee | et ‘oe SIT Le CRSP REE TS 
os faim) TTT 11 | § pee oevem| 111) tase 11 | 8 

= ate de Ree de : ected X2 
18h eo ‘LIT | L111?" 1 111 18 [tes tee |g verter} | | | [Homecare | | B 
109 | ‘811 BABA Eg aid 3 Be esr |su'ou'e ent Tinsel Unk e. 
“vee | een [i tlimomi iii |g lee Roleowid ec hS5 85 9 
asin Ce 2 zer \ee‘or“e SIII brim™ reid | By 
ZG rin | tliis “111113 v9} stesmy) PIT b liter |B 

. - Pall | | ane oe 3 gee ¢ ‘¢ ‘6 III | BS deviecsie’, lll} 
‘ osf era} Itt bimtilll |@|eoleriem! itl ime**7 111 {8 
5 | ese eth 11 [eeeree tt. | 5 exe | 66m} |i! tl E apsbanias a ie. 4 
e ee} rom | | Bape bres tek, | | “3 ree | 0¢ e111 [ {a eeeer eT et ae 
5 eee | 311 a J=see |] | | | 3 | cos | oe ei} IT™E™" TTL LI | 3 
? vee} von |mOTPem |i 1d § 06> | e3 ‘w FIT OF sete LBAAS, 1|§ 
E Tt BE pee Beso! 2k ka? OE bees OED ae 
= 66F | zm | | | ies CLEA. g rer | eet be 1 Terre Er E 
S| oer | sre 11 | PII 11 |g ter | oe © GIT PHI@"T1ITiti fs 
A or | evn Lier rrrr i 11) aie! sem | 1111 eee iis 
sler| eon |) ite" ll |S }ee! mom | lili imeeet1 | 8 
Slow] com [1 1iTee ltt |S fae] em | lili ttisen-11 {8 
Tle! pom | Im@Fl litt |S [me] sem | bilimee™ siti |g 
Sloe) em | 11i™™en*111|8)os| sem |i iiiiereee 11 |B 
Zz wo | $11 ria iet /g [ew ‘6 II Aart Lene 
gfus| ium [lititeee-"1 (8fem| sem |i iinmeeeni1i [8 
Slec| zn Liitineeenis |g sie | ITI BES mianet EL ER: 
ae) orn | 111t2°r" ttl |S fur] wom |i iteern itll [8 
wr oem | (ame sErey i] (Stas) sm | li leeneree| | 
cr | on | aietatat tetas ii: 18! ool im |1Inm*e9se" tt] | 3 
ann| auaia | Te | Ve) rm | | intent E | 

| ‘@x‘o=11 Perr eeere 117s ie € III beyorePiy itt ss 

809 (payjos) 19 BRE B bacon pie © one an 

rep (pages)'g | TEESE LE | S$] toxtgem [ll bite" tt ie 

















124 Inheritance in the Foxglove 


was 574. In this family there was a plant (III 2, 1) with a ratio of 533 and the 
mean of offspring = 563. III 2,1, 18 (ratio 551) produced a family with mean 
561, and IIT 2, 1, 18, 28 (ratio 598) produced a family with a mean of 606. 

In the reverse direction, through III 2, III 2, 5, ITI 2,5, 10 and III 2, 5,10, 22 
we pass from a parent of ratio 575 to a family having a mean ratio of 477. 

With the data given in the preceding table, correlation tables have been 
prepared for parents and offspring, and grandparents and grandchildren. 


Correlation Table—Ratios of Corolla—Parents and Offspring. 


Series II and ITT. 





| 
| 









































| 


Offspring. 
a ba pe be eee a eee 
Parents. FS/R/SISl/ Sli sl SslselEeEl EB! Sls 
Grades of a) St oS bm ES bh Sep ek lc Bs | Ss re 

BoA LI LICL Titi li ci bl £4 LY] tosis 

Ratios | 1000) S/S) 2/8/8181 81/818) 818 

> | S > Ss) Ss) S S 2 > S S 
410—439 | — | — =i ¥P Ort) a. 8 | Yes Be 15 
440—469 —;—|-|-|- ety | 8) 6 6, kt i— 25 
4V0—499 F¥—;—|—|—] 4] 16) 39 | 35 | 23/11) 3] 1 132 
500—529 }—| 2 | 3| 8| 15} 28] 45 | 40| 28) 6| 2 | 1 178 
530—559 | —| 1 | 6|15| 29] 47] 38| 32| 7| 3|—|—] 178 
560—589 | —| 2 | 12/15 | 30| 34| 18/13] 2 | --|—|—]| 128 
590-619 | 1 | 2 had Geek Read Maal Cas pend Sel ew |}—| — 49 
608) 8 | | a ef | | |= 0 
650—679 |—| 2] 3| 2| 2] of 1]/—|/—j—|—|— 10 
| | | | | | | 
| j 1 

Totals | 1 | 9 | 33 | 58 | 95 | 131] 143) 132 73 | 30/6] 2] 713 
——— } ! ———EEE ' a - 


Correlation Table—Ratios of Corolla—Grandparents and Grandchildren. 


Series II and ITI. 


Grandchildren. 






































| 


oy Wh OE DEE - pads, 
| Grandparents. | >| S| FS) 2!) & | ee) S| & | > | SS | go 
| Grades of >iec| si s/s) 3/381 s/ 815] S$] Ss 
PILL EL Til Ty Pl yiti Fite ome 
ae ., | a J ! ! | ie) he 
|Ratios> 10007 21; Sif; e2/;2/2 | Ee ee a a 
L > as | om 2 ee % = S se = os 
Si s| | ©1/ S61] 3S] S/S | 3 | S| Ss | 
| | | 
440—469 J — | — | — | — Li OT B] 9 | a1 | oS 29 
| 470—499 }¥—|—}—|—| 2} 3}10]/15/11|} 3/1] 1 46 
500—529 | —|—|—|—]| 4] 22 | 36 | 25 | 22/12] 2 | — 123 
530—559 1 | 3 | 14 | 38 | 37 | 34 | 24/116! 9| 2} 1.)— 179 
560—589 |—]| 2] 8] 10| 18| 33|21|21| 3} 2} —| — 118 
590—619 eS Dee PS el PA BE dl Peres Pt (a ss 8) 
620—649 OR] RS ees Hees eee eee ee) eee (pee | a eh 0 
650—679 =i & 3 SY 1|— | = I—j—|— 10 
| | | | 
Totals | 1 | 7 | 25 | 51 | 63 | 93 o1| 86 | 54| 26] 4 | 1 505 
—_ «> _ — U - - — = \ = - 








The coefficients of correlation are ‘601 for parents and offspring and ‘492 for 
grandparents and grandchildren. The latter figure is somewhat high; but taking 
the results altogether they are incompatible with any notion of pure-lines. 


























ERNEST WARREN 125 


8. GENERAL CONCLUSIONS. 


In the various characters that have been dealt with in the crossing of different 
strains of the garden foxglove we have seen that in pelorism, colour of corolla and 
colour of spots, the mode of inheritance is Mendelian with reference to the 
qualities: peloric and non-peloric, purple and white corolla, purple spots and 
brown spots. If, however, there are any marked differences in the intensities 
of these qualities, the mode of inheritance of the intensity of the quality was 


found to be of the blended type. 


The other characters examined were quantitative in nature, such as degree of 
the development of purple spots and the ratio of breadth to length of corolla, and 
these characters blended completely. 


When the intensity of a quality is very slight and approaching zero the 
difficulty arises as to which éategory the individual should be referred. When 
Mendelian inheritance is in evidence the critical point may apparently be determined 
by the occurrence of segregation. Thus, if a homozygous plant with a very faint 
tinge of purple (say an intensity of about 4) is crossed with a homozygous strongly 
coloured plant, segregation occurs in the so-called F, generation, and we obtain 
on the average 1 faintly tinged plant to 3 much more darkly coloured plants. 
When, however, the pale plant has a somewhat greater intensity (say about 10), 
the F, and subsequent generations are intermediate, and definite segregation does 
not occur. In accordance with this procedure a plant with flowers having an 
intensity of general coloration which did not reach 5 of the scale was classed as 
“white.” Without employing such a line of demarcation the results obtained 
were wholly unintelligible. 


From the strict Mendelian standpoint, in the example given above, it would 
probably be affirmed that the faint tinge of purple on “white” flowers is not 
really a fractional part of the general purple coloration of coloured plants, but is 
a distinct character governed by a different factor or set of factors in the chromo- 
somes. To one who has grown the plants this view appears an artificial one. 
In my previous account I stated that there appeared to be a distinct gap among 
my plants between “white” plants and coloured plants, and that colorations of 
about 8—25 of the scale were extremely rare or almost absent, but I have sub- 
sequently obtained a number of plants having such intensities of coloration, 
passing imperceptibly down to absolute whiteness. Consequently it is quite un- 
likely that the faint tinge of purple on “ white” flowers is anything else than the 
last remnant of a general purple coloration. 

It is quite similar in the character of pelorism, but the difficulty in finding a 
suitable method of measuring this character renders the matter less obvious. 
Thus, it would appear that if a character is not present beyond a certain minimum 
or unit quantity it may be unable to blend on crossing with a plant possessing the 
character in a well-marked degree. 








126 Inheritance in the Fouglove 


With reference to the characters which blend, the accompanying table sum- 
marizes the results obtained for parental correlation. Mid-parents and self-fertilised 
parents are regarded as comparable. 














| rage —— of 
Number o orrelation, 
| Character Offspring Parents and 
| Offspring 
Intensity of pelorism (homozygous recessive, | | 530 520 
mid-parents and self-fertilised parents) f | “* 5 
| Intensity of general purple coloration (homo- | 529 “107 
zygous dominants, self-fertilised parents) J |  * = 
Seed-length (self-fertilised parents) Soe ol 46 378 
Spotting (self-fertilised parents) ae ay 716 “560 
Ratio of Corolla’ (self-fertilised parents) a oe: ae ‘601 





The probable errors of these results are reasonably small and the average 
coefficient for the 5 characters is ‘553 which is not far removed from the average 
coefficient found by Professor Karl Pearson for a large number of characters in a 
variety of different organisms. 


It must be again emphasized that these results are based on self-fertilised 
generations of pedigree plants of known gametic constitution, and on Johannesen’s 
theory of pure-lines these parental coefficients should be zero, or at least very 
small. 


The evidence of the present investigation is therefore definitely against any 
general application of the theory of pure-lines and of genotypes of any appreciable 
magnitude, and further it indicates that selective breeding within self-fertilised 
generations of a homogeneous race is capable of modifying that race to a marked 
degree. 


EXPLANATION OF PLATE I. 


Figs. 1 and 2.—Pelorism of maximum intensity; grade 100°. Corollas absent, sessile anthers. 

Figs. 3 and 4.—Perfect pelorism, grade 100°. Corollas joined along their split edges forming a complete 
saucer. Stamens with filaments. 

Fig. 5.—Peloric flower of side-axis ; the axis terminates in an ovary. 

Fig. 6.—Pelorism of grade 100°. Numerous flowers fused irregularly forming a rosette, the axis has 
grown through the crown. 

Figs. 7 and 8.—Incomplete pelorism of main axes, grade 75°. A spiral bending often occurs. 

Fig. 9.—Faintly defined pelorism. When such occurred on the lateral axes the plant was said to 
possess a grade of 25°. Side view, and view from above. 

Fig. 10.—Flowering axis of a conspicuous sport in which practically all the corollas are completely split 
longitudinally into four elongated blades. Nature of inheritance obscure. 


The photographs were kindly taken by Dr Conrad Akerman. 














ON POLYCHORIC COEFFICIENTS OF CORRELATION. 
By KARL PEARSON, F.R.S. anp EGON S. PEARSON. 


(1) ONE of the difficulties which are constantly recurring in statistical practice 
is that of the correlation or contingency table in which the two variates are 
classified in broad categories. We may indeed proceed by the method of mean 
square contingency and correct for the grouping of both variates by the class 
index corrections on the assumption that the marginal totals for both variates 
may be assumed to follow approximately normal distributions. Such a procedure 
gives reasonable satisfactory results*, provided the marginal totals are not in very 
unequal groupings and the correlation is not intense (say, ‘85 and above). The 
polychoric table has been discussed by Ritchie-Scott and he has described a method 
of reaching a polychoric coefficient of correlation from the weighted mean of the 
possible tetrachoric values+. Such a process is, however, so laborious that it can 
hardly establish itself in practice. From the theoretical standpoint, however, 
Ritchie-Scott’s paper was of great interest (i) as guiding us by the size of the 
probable errors to discriminate between the valuable and worthless dichotomies in 
tetrachoric determinations of the correlation, (ii) as providing standard values by 
which those obtained by other procedures could be directly tested. 

We shall endeavour to reach in this paper another form of polychorie co- 
efficient,—that is a correlation coefficient which does use all the information given 
in a polychoric table,—but which requires less analysis than Ritchie-Scott’s weighted 
mean coefficient. Thus what may be lost in exactness will possibly be repaid by 
practical efficiency. There is another point also of very considerable illustrative im- 
portance ; we desire wherever the data are suitable actually to exhibit in the form 
of a graph the relation between the two variates. This should be possible in the 
case of a polychoric table, and in the past has frequently been done by approximate 
methods of more or less validity. 

We can indeed take such methods as our present starting point as they will 
directly indicate to the reader our line of approach. 

We start with the hypothesis that the marginal totals of our polychoric table 
can be represented on a normal scale. This is no great assumption in itself. If a 
true quantitative scale ever becomes available it can be attached at once and with 
little trouble to the normal scale. To exhibit a variate on a normal scale makes 

* By ‘‘ reasonably satisfactory results,” we mean that in cases which can be directly checked by the 
product moment method the difference is within the range of practical insignificance as judged by 
probable error. 

+ Biometrika, Vol. xu. pp. 93—133. 

t Thus in a 3x3 table it is possible for two of the corner dichotomies, i.e. those unassociated with 


the diagonal in the sense of the correlation, to have even negative weights, so that they should be omitted 
in finding the mean. 








2 


128 On Polychoric Coefficients of Correlation 


no greater assumption than when we exhibit a pressure-volume curve as a straight 
line by using a logarithmic scale. 

Now let the polychoric table be such that in the population V under discussion, 
the sth category of the first variate A contains n,. individuals and the s’th category 
of the second variate B contains n.y individuals, while the number of individuals 
who combine in the population V the sth category of A and the s‘th category of B 
1S Ng’. 

Now when we proceed to exhibit the categories of the A-variate on a normal 
scale, the process will give us two important quantities : 

(a) We shall have the ratio of abscissa to standard deviation at the dichotomy 
between each pair of broad categories. 

If 2,., No, Nye, «-. Nge, -.. be the frequencies of the A-variate for the several cate- 
gories the values of the ratios of abscissae to standard deviation will be specified as 
=O, hy, Ry, Wes Razocnttac: Bence © 
Here h,_,, h, are the values on either side of the category m,;. and if there be 
q categories, n., is bounded by h, or — 2 and h,, while n., is bounded by hy, and 
h, or +. The lower h’s will have negative and the upper positive signs and the 
greatest care must be taken to see that the proper signs are given to the values 

of h. Similarly if the frequencies of the various categories of the B-variate be 
ins - ‘tas © ies ss Chars oe 
the values of the ratios of ordinates to standard deviation will be represented by 
—@, &, ky, hy, ... hy, by, ...by,+®, 
where ky_, and ky give the dichotomies on either side of n.y. 

We may consider the coordinate at the back of the variate A when represented 
on a normal scale to be «’, the origin being taken at the mean on the normal scale. 
Hence if the standard deviation be o,, we shall find it convenient to write the 
absolute normal abscissae 

uw =Gyt, he =Gzh,. 

Similarly we take y’ for the coordinate at the back of the variate B, measured 

from the mean, and write: 

y =oyy, ke =ayks, 
where o, is the standard deviation of B. Clearly until a quantitative scale has 
been determined we shall know h, k, «, y but not h’, k’, 2’, y’, oz and o,. 

(b) We shall determine the ratio of abscissa to standard deviation, or the ratio 
of ordinate to standard deviation of the centroids or means of the groups n,. 
and ney. 

Let H,=-e-*', Ky ==” 

\ 2Qar ‘ 
then the means of the categories n., and ny. are determined by 


he = (Hes — H,)/ Keg (Koo — Be) [Fe vveeoveseeeee (i) 








mee 





























KARL PEARSON AND Eaon S. PEARSON 129 


respectively. The numerical values of h, and ky can be easily ascertained from the 
table published recently of ordinates of normal curve to permilles of area*. Care 
must be taken in every case to give the correct sign to h, and ky. 


Now if there were no correlation, h, and ky combined would give the mean of 
the group n,,., and they give a fair approximation to the result if there are numerous 
categories, that is if the range of the categories be small. 


The correlation found from these marginal centroids would then be 
1o= S (gy hgky)/N .......00ece0ee iecamuniawnetecs ooh 


but as Ritchie-Scott has shown+ this r, diverges much more than rg the mean 
square contingency value from the true correlation, and considerably more than 
the tetrachoric or polychoric coefficients do. The reason for this is clear and was 
pointed out by one of us in 1913+. Namely hk, and ky do not give the coordinates 
of the mean of n,. In fact ngyhgky is not the contribution of n,. to the product- 
moment. 

We propose in the present paper to give first the actual contributions of n,,. to 
the means and product-moments of the two variates and then to apply these results 
in order to obtain (@) a polychoric coefficient, and (b) a graph of the relation of the 
two variates. 


The essential assumptions that will be made are the following: 


(i) The marginal totals having been reduced to a normal, scale, and the corre- 
lation being supposed to be 7, we shall calculate what the contents of the sth-s’ th 
cell would be on the assumptioa that the frequency surface is the normal surface 
represented by the given correlation and the marginal totals reduced to normal 
scales. We shall further calculate the a-moment, the y-moment and the ay product- 
moment of the sth-s’th cell on the same hypothesis. 


(ii) From these data we shall determine the most suitable value to give to 7, 
so that the actually observed frequencies differ least from those that would be given 
by such a correlation surface. We shall also obtain a formula for caleulating the 
mean value of y for the array of B-variates, n,. in number, which corresponds to 
the sth category of A. We shall thus be in a position to plot the regression line of 
Bon A and test at the same time the closeness with which it fits the thus ecaleu- 
lated array means, both variates being represented on a normal scale. 

We shall write the real coefficient of correlation of the population +, the 
coefficient as found from a single sth-s' th cell, as r,., and those found from the n,. 
and n.,y arrays as 7;. and 7.» respectively. 

hey, Key Will be the A- and B-variate means of the sth-s’th cell and a, the 
product-moment, per unit of the population, of the frequency in the sth-s’th cell 
about the mean axes as determined from the marginal totals on the normal scale. 


* See Biometrika, Vol. xu. pp. 426-8. 
+ Biometrika, Vol. x11, p. 122. 
t Biometrika, Vol. 1x. p. 138. 


Biometrika x1v 9 








130 On Polychorie Coefficients of Correlation 


(2) The developments we require involve the use of the tetrachoric functions. 
The tetrachoric function of the order ¢ is given by* 

1 d t—1 1 ~ iat ate 
“= — (- 5) CARES T - weksseyeinecesnensnennsee (iil). 
Vt! da} Vr 

The tetrachoric functions 7, to 7, are tabled for positive values of # in Tables 
for Statisticians and Biometricians*+ to five decimal places. For negative values of 
tetrachoric functions of an odd order remain unchanged, but those of an even order 
must have their sign as given in the tables reversed. 

It will frequently be needful to take the difference of the tetrachoric functions 
at the boundaries of a marginal category. Thus if 7;(h) denotes the value of. the 
tetrachoric function for # = h, we shall need for the sth marginal total 

: Tt (hs) — T (hs). 
This difference we shall write, for brevity, 
SeTe, 
and in obtaining its numerical value from tables of the tetrachoric functions it is 
essential to remember that s (or s’) is supposed to increase in the positive direction 
of the axis of « (or y), and that when h (or &) is negative attention must be paid 
to changing the sign of the tabled value of 7;, if ¢ be even. 

The formula for determining the successive tetrachoric functions for a given 

alue of # is 














| t Pr qt t | Pt | Ut 
| | 
| 2 ‘707, 1068 *000,0000 14 | ‘267,2612 |. °889,4990 
3 | ‘57,3503 | -408,2483 | 15 | -258,1989 | -897,0851 
4 | :500,0000 | -577,3503 | 16 | -250,0000 | -903,6962 
5 | 447.2136 | 670.8204 | 17 | 242,5356 | 09,5085 * | 
6 | °408,2483 *730,2968 18 *235,7023 *914,6592 
7 | *377,9645 ‘771,5168 19 | *229,4157 | °919,2547 
8 |  -353,5534 | -801,7838 | 20 | -223,6068 | -923,3804 
9 | -333,3333 | -s249578 | 21 | -218,2179 | -927,1051 | 
10 | -316,2278 | -843.2740 | 22 | -213,2007 | -930,4842 | 
11 *301,5113 858,1163 23 | °208,5144 *933,5637 | 
12 | -2886751 | -870,3880 | 24 | -204,1241 ‘936,3819 | 
13 | -277,3501 | -880,7047 | 25 | -200,0000 — -938,9709 





1 ee ae : : 

Since 7, = ——-e~ *”, it can be found at once from the tables for the ordinates 
NV 2or 

of the normal curve, and will indeed have been computed at each division in order 


* The reasons why the tetrachoric functions are tabled with the factor uve! are: (a) because this 
factor greatly simplifies qur formulae and ()) because a factor of some such order is essential, if we are 
to have manageable tabulated values. As a matter of fact the factor chosen reduces all tetrachoric 
functions to numerical values lying between 0 and 1. 

+ Cambridge University Press, pp. 42—51. 




















Kar. PEARSON AND Econ S. PEARSON 131 


to determine h,. and h.y. It is then often simpler to work directly with (iv) rather 
than interpolate into the tabled values of the functions. 


In an earlier paper* dealing with the tetrachoric functions one of us has shown 

that if 
N , v2 -2ray+y? 
= — ie : 1-7? 
Qn (1 — 1°) 
be the equation to a normal correlation surface the variates being measured in the 
standard deviations as units, then 
2/N = 17, + Wrrete + 372 tyTs +... HERD) 1 ttt +... 

where tT; = 7;(#) and 7,’ = 7; (y). 


Now in order to proceed further it is needful to determine the following 


integrals : 
h, h, 
| Tt da, | UT da, 


"ey J Ag} 


We can determine these by using (iii) after in the second case integrating by 
7 § § by 
parts. We have: 


rh, fh, 1\t- _ 1,72 
| T,dae = . , Che ; *) i da 
J h, i VE! J da, Vv 
| : a ae ) ra os i h, 
s|- 25 1 i 
— Mtt\ dal Vr yy 
1 
=— Pia Ties ay sconcaigun oeviore saree eka keseaelnes oer neen (v1). 
vt 
h, he 1 ex. Bo Las 
Again: | oe dw = | a(- ) e = dz 
hey Peart h,_, Vt! da cy 


t— ~1y? h, 
lat “)(- 3 =) a ng 
Pa S is : d of 1 07 "de 
+ i hy, dx V 20 $ 


h 


= ; . Te da 
-| 57 Ng-y s = 
| - 3 h, 
= Tt & — Tt-2 
By aes gers hey 
] | ‘ 1 . (vi) bi 
=——-|/%,2 a) Se eee v1) bis. 
vt vt—1 he 
But by (iv): 
Te + Ot Tt-2 
131t= tv Tt =. 
Pre 
where pr= 1/vt, g=(t—2)/Vt(t— 1). 
* Phil. Trans. Vol. 195 a, p. 4, Equation (xiv), with a slight change of notation. In that paper, 
ail % : is written for 7,,, and a ev j vn for 7’,4)° 
V2r V(n+1)! V2r V(n+1)! 








132 On Polychorie Coefficients of Correlation 





r 1 ince 
Thus : Tae + Fa ge Vt + Vt— 1 tn. 


Accordingly 
hg 


hs 1 = 
at,da =— —= Bu 
he THA Vi Vir + ve Lrs| 


hs 


1 ry ———-- 
= — (ME Sete + VERT Sete) oo eeoeeeseeeees (vii). 


The latter form throws us back on $,7; which will have to be calculated to 
determine the integral in (vi) for the successive values of ¢ and s. 


On the other hand a table of 
/ => Vt 7, + Vvt—1 Tha cecccecceccseccecescevecces (viil) 


would be a convenient method of determining the integral and tables of 7’ might 
be easily formed, say up to 7’. 


In this case we may write (vii): 


1 


hs 
er, dae = — — 
Vt 


J g_y 

We are now in the position to compute all the requisite integrals we need; if 

we write ji,y for the contents of the sth-s’ th cell, then on the supposition that the 
surface is normal, has correlation 7 and follows the actual marginal frequencies, we 


have: 
Neg! hs ["* zZ 
= — dada 
N i Jk, N J 
= 9,79" T, + P37 Sy 7) 5 PS, T2397: a 1 Y,Ty Ie Tp +e. (x), 
Nee + rhe ke axzdada F 
N hey = , ; N 7 =%, Ty S97 HPS T Bet) + 1°93, T3972 
J Mg J Key 


Have FIR Ty Fy FH oss seeseness xa), 


Tiss’ I - ae _ YZ dady 
ne 


=93,7% 997. +7371 99 Ty + 7°94 7239 Ts 
ks-, N 


Hse PPR ee Te FH o5 secsvectere: (xii), 
Nise’ | he [ke wyzdady 


§, 7,87 Ty +r VeT Sy Li + PX, Ty Ts 
hg, J ke, N 


et SH Ae (xiii). 


It is desirable to say a few words about the functions 7 and 7, which may at 
first present difficulties to the reader. — 7, clearly stands for the integral 


hs l ae : hg 
| ——@¢ * dz, ie. | 7, dx, 
Ns V2 hs_, 


and is therefore simply n,./N. 


Similarly — 7) = n.y/N. 




















KARL PEARSON AND Egon S, Pearson 133 


Next clearly —$,7, stands for 


h h, 1 =| 
* 7,ada = —— awe ** dx 
J hg_| hs_, V2Qer 
1 ~ hy? hg 
=— 752 e * 
N Qar Ney 
— oe 371; 
or 3.7, =3.71; 


which is precisely the value given by (viii). 

Thus (viii) is shown to be correct even for this special case although a form 
like (vi) bis through which it is reached shows difficulties. 

Similarly 3,7,’ = 9357,’ *. 

The remainder of the 7’s knowing 7, and 7, come directly from (iv) and the 7’s 
are always given by (viii). 

Now it is clear that (x) to (xiii) provide a large number of ways of deter- 
mining r. We might find 7, i.e. 7,., from the single cell by writing in (x) ngy for Tig. 
Or we may find 

is 1 b. 
h,. = — S(ngy hsy) 


Use 9! 


ll 


N Wess i , 7 ’ , : 
S (** (9, L,9y te + P8y Ti Sy ty + + 1°94 Ty Sy Ty! + a (xiv), 


Nge gt \Nige’ 
where 7i,y is given by (x). But A,. is the known centroid of the n,. marginal total, 
and accordingly the above is an equation to find 7, i.e. ry, from a given column of the 
table. 

If we use this value of r,. in (x) and (xii) to find i,y, and /,y, we obtain the 
theoretical cell frequency and y-mean of the cell as found from a column. 

Now sum hk, for every value of s’ and we find &,. the y mean of a column 
depending on the data as found from the column, Le. 


hy. = N S (<# {Jet 9e Lo +79 By Ty +... +PPMTpSe T,’ + a (xv), 


ie Mg. s' \Nse’ 

where ngy is the observed cell frequency and jisy the frequency found by (x) when 
we insert the value of ras found from (xiv). We are thus in a position theoretically 
to determine on a normal scale the mean of a column from the correlation actually 
determined from that column. This would be the ideal method of determining the 
mean of a row or column; but it would involve a great deal of hard work, as with 
the two regression curves we should need to find r for every row and column by an 
equation of a high order. Hence in most cases we are likely to content ourselves 
by finding r for the whole table and then use this value in (x) to determine Tiss’ 
and in (xv) to find the mean of the array. /,. plotted to the known A,. on the 
normal scale will give the regression curve. 


* We can thus take 7y)=7, and 7')'=7;’. 











134 On Polychorie Coefficients of Correlation 


The question now arises as to the manner in which we can find r for the whole 
table most effectively. 
Clearly we might assume the product-moment components from (xiii) and sum 
for all cells. We should have 
NgglT os! 
s ( 8s “) =f 
a ae : 


since the coordinates are measured from the means in terms of the standard 
deviations as units. 
Hence substituting from (xiii) we have: 
Migs! ; 
r= S (Ge {SsL9e Ty + r3sTi9y Ty +... +7957, Sy 7,’ + “i (xvi). 
8,8 8s 
Here ii,y must be substituted from (x) and we have finally 
os (ss (IT \3e To +r3,7 3 Ty +... + 7°97 Ie Ty’ + -.. 
se \ NM | Met Se To +I TY + oe + 1P3,T,) Ie Tp +... 
This equation based upon the product-moment method of finding r is clearly 
likely to be very complicated, and although it can be proved that the product- 
moment method is the “best” method of finding 7 when we are dealing with 
a series of quantitatively measured individuals, we have no certainty that it is the 
best method in the present case of broad categories. It may indeed be questioned 
whether another method now to be considered cannot be shown to be better or 
at least equally efficacious. 


) (xvi) bis. 


Let us consider for a moment what we have in view. We observe ny as the 
frequency of the sth-s’th cell; we find that with a given correlation 7 the frequency 
of this cell would be 7igy on the assumption that the frequency surface is the normal 
frequency surface corresponding to the observed marginal totals. Accordingly the 
most probable value to give to 7 would be that which made 


” ’ ° M 
v= = ~ = minimum, 
; 8,8 Ngs’ 
or, what is the same thing, 
ee “ae 
S bE = ) = minimum. 
s, 6° \ Ngs" 


This leads us, differentiating with regard to 7, to 


yg (Ge) dire) = 
8, 8” Ness! dr ) 


or, writing at length, our equation for r is: 


’ 7 7 7  y =() xvi " 

N Bye ALI IT HDT Te +... HIPITPSy Tp + ---)) ( ) 
Neither (xvi) nor (xvii) are very readily solved. Probably the easiest way will 

be to obtain an approximate value of r by existing methods either from a good 
fourfold table, or from contingency, and then evaluate (xvi) or (xvii) for values of 
r, one well above and one well below this result, so that the real value of r lies 


sg (5 \2 MTT H+ WIT Te +... + PIPADM, TI y Tp +... ) 
) (JsTo 





we tose 


ee 








EE ee 





Karu PkARSoN AND Eaon S. PEARSON 135 


between the two. A linear interpolation will probably suffice in most cases to 
determine r with sufficient accuracy. 
It will be observed that what we are trying to do is to fit a normal correlation 
surface to a series of cell frequencies. We may do this by equating product- 
. ; : Ng Ngg’\* 
moments, or actual cell frequencies properly weighted. The factors —* and (=) 
8s’ Ngs! 
come into our equations as a form of weights. When n,, is small as compared with 
figy that cell will contribute less to the general equations for r, and when my is 
large as compared with 7i,y, the contribution will be considerable. If the observed 
results were closely normal then n,y would be nearly fi,y. If we might assume the 
differences of n,. and jis so small as to be negligible we should have: 
4 yy , . 
r= S O.T)39 Py +7 3,T Sy T+... + 7°97 ST,’ +... (xvi) ter, 


8,8 
and 0=S (Se 971 + WITT +... + pr?B,7y3yTp + ... (xvii) bis, 

8,8 
instead of (xvi) bis and (xvii). These equations it will be found are identically 
satisfied. Hence our values for r from (xvi) and (xvii) depend on jig differing 
from Ny". 

(3) We now proceed to illustrate the application of these results. 

Stature of Father and Son. 

The following table gives a correlation table for the inheritance of stature in 
Father and Son made up in broad categories corresponding to eye-colour groups*. 
Upon this material we shall be able te test our correlations and our graph against 
those found by definite numerical groupings. 


Stature of Father (Broad Categories). 

















t- i.2 3 4 5 6 i Totals 

Tc | 

= | 
-} ¥ 4 22 7/— 1 —/;— 34 
Ss] 23 | 154 | 84 | 26 8 . }-~ 301 
5-5 3 8 87 | 75 66 22 24 2 284 
RS 4 I 29 | 36 37 14 14 6 137 
o3 5 is | 27 | 2] WU is | 5 105 
Ae 6" _ 9 | 2% 19 7 29 8 98 
=~ pst 
= * 3 9 6 6 10 7 41 
n | 

“ |Totals | 36 | 322 | 264 | 180 | 69 | 101 | 28 1000 





The positive direction of « is from left to right and of y vertically downwards. 

It will suffice to take the 7’s to five decimal figures but it will be needful to go 

further with the 7’s if the 7’s are to be taken correctly to five figures from (viii). 
The general reduction formula for the 7’s is: 

T,_, (a) (t — 2) Vt —1 a — T,_.(«) (t—3) (t— 1) +1) 


.-.(Xvill), 
Vt(t—1) (a2 (t—2)+1) 


1 (uw) = 

3 —$§$ leas 

ie - qe \ © in-tha—1 + % . i oh xviii) bis. 
© sag 0-0 DEO} om 


* See Biometrika, Vol. 1x. p. 220. 








136 On Polychoric Coefficients of Correlation 

Hence if 7’, and 7, be found accurately the remaining 7’s can be determined as 
accurately as we please without reference to the 7’s. 
1 


But, Ty = Ty OE lke eee REE Re ELE RCI Os Se MPT (xix), 
N Qa 
= a“ rs wv 1 a 
T =V2 = ene | a BOOM eo ciculicaces (xx). 
=NZT. T= 
: 2+ To Var -«» V2 ) 


Hence the tables of ordinates and areas of the normal curve readily provide 7, 
and 7, to seven decimal places, and (xviii) provides the higher 7’s. These were cut 
down to five figures and an approximate check on their values obtained by (viii). 

As a matter of fact if 7 is of the order ‘50 we cannot hope to obtain more than 
three figure accuracy in 7 without going to higher r- and 7-functions than the 
sixth, especially when using (xvii). But three figures in the correlation are usually 
adequate and the labour of computing is much increased if higher functions are 
used. Such must, however, be used if the correlation be sensibly higher than 50. 

The following table gives the $ (1+ )’s, h’s, H’s, %’s, 7’s, S7’s, 7’s, and $7’s for 
the w-variate. 


TABLE L. 























| \ 
4(1+a) 0 | 036 “358 622 | +802 “871 | 972 | 1-000 
| | | | ie Sie i 
h —o |-1'79912 —°36381 +°31074 | +°84879 (41713113 41°91104, + 
H=r, 0 07908 | *37340 38014 | ‘27827 | -:2104% "06425 | 0 
7 se 
Bape = =, —2°19667 | — 91404 | — "02553 | +°56594 + 98333 + 1744723 | +2°29464 
* 0 — 036 — °358 | — "622 | —*802 -— ‘871 -- ‘972 | —] 
™1=T) 0 +°07908 | +°37340 | +°38014 | +°27827 | +°21042 + °06425 | (a) 
rT. 0 —10060 — 09606 | +°08353 | +°16701 | +°16830 + ‘08682 0 
T3 0 + 07221  -- 13226 | —*14021 | — 03176 | +°02401 +°06952 0 
v4 Lb © —*00688 | +°07952 | — 07001 | —-10990 | — -08359 , + 01634 | 0 
75 | O ~-04291 | +-07579 | + 08432 | — -02041 | —-05839 | — 03270 0 
7%; | 9 | +°03654 | — 06933 | +°06182 | +-07319 | +-03408 | - 03744 0 
Ir | | —*036 — 322 | — "264 — ‘180 -*069 | —-101 — 028 | 
Sr, | | +°07908 +°29432 | +°00674 | —-°10187 | —-06785 | —+14617 | —-06425 | 
Ir, — 10060 +:°00454 | +°17959 | +-°08348 | +-00129 | — -08148 | — -08682 
Irs + 07221  — -20447 | — 00795 | +°10845 | +°05577 | + °04555 | — 06956 
Ir, | | —*00688 + °08640 | —-14953 | —-03989 | + -02631 | +-09993 | — -01634 
Its — 04291 | +°11870 | + 00853 | —+10473 | — -03798 | +-02569 | + 03270 
Ir | | +°03654  —-10587 | +°13115 | +-01137 | —-03911 | —-07152 | +-03744 
T, | o — ‘17827 | —°49385 | — 50388 | —*56581 | — -63299 | — -84922 -1 
T, | 0 +°23690 | +°29898 | + °29475 | + °33853 | +°33915 | +°21135 | 0) 
T. | O | —*18799 —-00734 | +:00466 | +-06947 | +°12432 | +°18307 | 0 
7’; O | +°04848 —-09506 | — 09186 | —-10916 | —-08255 | +-06601 0 
7; 0 | +°07412  +4-07799 | —-00511 | —-06648 | --10343 | —-05518 | 0 
T; 0 — 08325 + °'05616 | +°05364 | +°05379 | +°01471 | — 08491 | 0 
IT, | + 07908 | +°29432 | + ‘00674 | —°10187 | — 06786 | -—-14617 ; — 06425 
ST; ~ *17827 | —°31558 | — 01003 | — 06193 | — 06718 | — 21622 | —-15078 
ST, | | +°23690 | +°06208 | —-00422 | +°04377 | + 00063 | —°12780 | — °21135 
IT, | | —*18799 | +°18065 | +-01200 | + 06481 | +-05485 | +°05874 | — ‘18307 
ST; | +°04848 | — "14354 + 00320 | —-01731 | +-02662 | +°14856 | — 06601 
IT; | +°07412 | - 06613  - -01310 | —-06137 | — -03695 | + 04825 | 4+°05518 | 
IT, — 08325 | +°13941 | — 00253 | +°00015 | — 03907 | — -09962 | + °08491 








ee 





a 














— 











KARrL PEARSON AND Econ S. PEARSON 137 


The following table gives the corresponding quantities $ (1 +a’)’s, k's, K’s, y’s, 
ts, $7,’s T’s and ST’s for the y-variate. 





TABLE IL. 
4 (1+a’) 0 | + “B35 | “619 ‘156 | -861 | -959 1°000 
: -o — - PP | + °30286 + 69349 +1°08482 +1°73920 +a 
K=r, | O {+ °07545 36431 | +°38106 | +°31367 |+ °22149 + -08792 0 
| ¥e= rt — 221916 95968 —-05896 +4°49188 +°87789 + 1°36301 | +2°14436 
$(ay — a’, 
To | 9 — 034 — *335 — 619 — 756 — 861 — 959 —] 
™1=Ty 0 +°07545 | +°36431 +4°38106 +°31367 | +°22149 | + 08792 0 
Ts 2 — ‘09737 | -°10978 | +°08160 +4°15382 +-°16990 | +-°10812 | 0 
Ts, | O +°07179 | —°12172 | — 14130 | — 06647 | + °01599 | +-07268 0 
Ty i ie -— 00929 | +°08932 | —-06851 --11185 —-08942 | +°00077 0 
Ts, i —°04057 | +°06463  +°08551 +-00990 --05411  --04815 0 
TT; Le +°03702 | —-07647  +-06061 +-°08449 +4-°04134 —-03475 0 
Oro | — 034 — 301 — *284 —°137 — "105 — 098 — “041 
Sr,’ | 07545 | +°28886 +°01675 —-06739 | —-09218 | —-13357 | — 08792 
Sr, | | —-09737 | —-01241 | +°19138 | +°07222 | +-01608 | — 06178 | -— -10812 
Irs | +°07179 —°19351 -— 01958 +-°07483 | +°08246 | +°05669 , — -07268 
Ir, | | —*00929 | +°09861 | — °15783 | — 04334 | + -02243 | +-09019 | —-00077 
Sex | | —:04057 | +°10520 | +-02088 | —-07561 | —-06401 | +-00596 | +-04815 
Ing | + 03702 | —°11349 | +°13708 +-02388 —-04315 | —-07609 | +-03475 
rT? | 0 —‘17170 | — -49025 | — -50360 | —-53847 | —-62073 | —-80610 —1 
7 | O +°23105 +°30439 | +-29416 | +°-32847 | +-34094  +-25021 0 
T. | © — 18723 | —-01151 | +-00432 | +°-04271 | +°11544 +-18882 0 
T; 1) + 05286 | —-09892 | —-09140 | —-11081 | —-08901 | +-03768 0 
T; O + °06989 | + °01240 | —°00474 | — 04316 | —-09869 | — -08340 0 
7! 0 — 08412 | +°05897 | +°05326 | +-06263 | + -02276 | —-08010 0 
$7.! +°07545 | +°28886  +°01675 | —-06739 —-09218 —-13358 | — 08792 
97" — ‘17170 | —°31855 — -01334 | —-03488 | — -08225 | —-18537 | —-19391 
97; +°23105 | +°07334 —-01023 | +°03431 +-01247  -—-09072 | — -25021 
97; —*18723 | +°17572  +°01583 | +°03839 +°07273 + °07338 | —-18882 
$T, + 05286 | —°15178 + 00752 | — 01941 | +°02179 , +°12670 | —-03768 
x + 06989 | — 05749 —-O1714 | —-03842 | — -05553 | +-10529 | + 08340 
3T — ‘08412 | +°14309 | — -00571 | +°00937 -— 03988 —-10286 | | $-0s010 





| 


| 








From Tables I and II we can find from (x) the value of ii,5/N for any given 
value of r, and by equating 7,,/N to n,,/N we should have an equation to determine 
the correlation 7 from that cell alone. The weighted mean of these 49 r’s would 
be Ritchie-Scott’s polychoric correlation coefficient. But the labour would be 
immense*, 

We are now in a position to give the product of $,7,S,y7,": see Table II, p. 138. 

There are certain checks on the accuracy of this table, namely 

Ssy 95 Tp Sy Tp’ = 0 except for p= 0, when it = 1. 
* We are not underrating the large amount of arithmetic of the present process. It is not likely to 
be often repeated, and the sole purpose of publishing all these tables for an individual case is to impress 
the reader with that fact; while at the same time illustrating the actual numerical processes. The 


amount of arithmetic, great as it is, is relatively small compared with that of solving and weighting the 
resulting 7’s in the case of a 49-cell table. 











bo 





138 


Pp 


Dok AON © 


S 


Lom WN © 


| 
3 


On Polychoric Coefficients of Correlation 


TABLE III. 
Values of Sstp3u Tp. 








0 
1 
2 
3 
4 
5 
6 


1 
2 
3 
4 
5 


6 


Doe & We © 


a 


| 
| 


| 
| 
a 


| 


| 


| 
| 


s=1 s=2 e=8 | 
+ 001,224 | + 010,948 + -008,976 | 
+ po "067 | + °022,206 + °000,509 
| + 009,795 | — 000,442 | — ‘017,487 
i+ ‘005,184 | - °014,679 — 000,571 
| + ‘O00, 064 | — 000,803 | + :001,389 
+001 ,741 | — °004,816 | — -000,346 
| + 001,353 —°003,919 + 004,855 
| 4 010,836 + °096,922 + ‘079,464 
+ 022, 843 + :°085,017 + 001,947 
+ °001,248 —-°000,056 — -002,229 
- (013,973 | + °039,567 + ‘001,538 
- (000,678 | + °008,520 | — 014,745 
| — 004,514 | + °012,487 | + ‘000,897 
— 004,147 + °012,015 | — ‘014,884 
+ ‘O10,224 | + -091,448 | + 074,976 
| + 001,325 | +°004,930 | + -000,113 
| ~ 019,253 | + 000,869 | + -034,370 
— 001,414 | + 004,004 | + ‘000,156 
+ °001,086 | — ‘013,637 | + 023,600 
— ‘000,896 | + °002,478 |+ 000,178 
+ 005,009 | — 014,513 |+-017,978 
+ 004,932 | + 044,114 | + 036,168 
— ‘005,329 | — 019,834 | — 000,454 
| — ‘007,265 + 000,328 + -012,970 
+ 005,403 - -015,300 | — 000,595 
+ °000,298 | — 003,745 | + 006,481 
+ 003,244 — -008,975 | — -000,645 | 
+ 000,873 | — 002,528 | + 003,132 | 
| Bette sab 
+ °003,780 | + °033,810 | + °027,720 | 
— ‘007,290 | — ‘027,130 | — -000,621 | 
— 001,678 | + 000,073 | + 002,888 
+ 005,954 | — 016,861 000,656 
| — 000,154 | +°001,938 | — 003,354 | 
+ *002,747 | — :007,598 | — -000,546 | 
— ‘001,577 ave! sd 004,568 | — 005,659 
+ °003,528 528 | + -0 031, 556 | + 025,872 
— ‘010,563 | — -039,312 | — 000,900 
+ 006,215 | —-000,280 | — -011,095 
+ 004,094 — 011,591 | — ‘000,451 
— ‘000,621 | + °007,792 | — 013,486 
— ‘000,256 +°007,107 | + 000,051 
— 002,780 + 008,056 | — 009,979 
— 001,476 | + ‘013,202 | + -010,824 
— 006,953  — ‘025,877 — -000,593 
+°010,877 — 000,491 ~-019,417 
~ 005,248 | + 014,861 | +-000,578 
+ 000,005 — 000,087 + °000,115 





002,066 | + 





005,715 


+ °000,411 | 


4 


“051,120 | + 


“006,120 4 





| 
| 


“002,346 | 


‘007 ,686 ~ 005,119 | 

008,128 — -000,126 | 4 

007,786 + 004,004 !+ 
000,371 — 000,244 | - 
004,249 +4 °001,541 | — 
000,421 | — °001,448 | - 
‘054,180 +°020,769 | + 
029,426 --019,599 | - 
‘001,036 — 000,016 + 
020,986 |— 010,792 | — 
003,934 + 002,594 + 
011,018 | — 008,995 + 
001,290 | + "004,439 + 


019,596 +" 


s=6 


— 011,029 | 


- 007,934 


‘003,270 
‘000,928 
‘001,042 
‘002,648 


030,401 | 
‘042,223 
‘001,011 
‘008,814 
"009,854 
‘002,703 | 
008,117 


28,684 


— “004,848 


— 004,994 


' + 000,152 


— 001,327 


+ 001,386 





‘008,428 
‘018,559 
‘001,077 


| + 003,434 | + “000,952 | 


| +-008,454 | 





“001,611 


03,440 | 
004,249 | 


a 
+ 
+ 013,461 
+ 


+ 007,952 | 


— 001,076 
— 016,616 
+ ‘001,362 
+ 002,579 
+ 000,683 
+ °005,132 
+ 003,836 
+ 004,330 
— 006,270 

005,205 


‘000,708 | 
“002,472 | 


‘002,940 


005,923 
001,396 
— 005,736 
— ‘000,367 
— ‘002,093 
-~ ‘001,616 


+ 
+ 000,894 
4+ 
“} 


002,744 


“005 ,364 
“003,943 


+ 
| 4 -008,582 
“+ 


001,474 | 


+ 000,195 


002,849 | 


+ 001,148 
+ 005,649 


‘008,810 | + °009,387 | 
003,311 | + ‘005,056 
“000,077 + ‘000,013 | 
| 4 001,575 | 


‘001,706 | —°001,136 | — 002,445 
‘015,976 | + ‘000,247 | — 015,594 
“002,123 ; — 001,092 | — 000,892 
‘006,296 | - 004,153 | — °015,772 
002,187 | — °000,793 | + °000,536 
001,559 | — 005,361 | — ‘009,804 
024,660 | + °009,453 | + ‘013,837 
006,865 | + °004,572 | + °009,850 
“006,029 | + 000,093 | — 005,884 
008,115 | + °004,173 | + °003,409 
001,729 | — 001,140 | - 004,331 
007,919 | + 002,872 | — -001,942 
000,272 | — 000,934 | — ‘001,708 
018,900 | + °007,245 | + °010,605 
009,390 | + °006,254 | + °013,474 
‘001,342 |+ 000,021 | — 001,310 
008,943 | + 004,599 | + °003,756 
“000,895 | + °000,590 | + *002,241 
006,704 | + 002,431 | —°001,644 
000,491 | + 001,688 | + °003,086 
‘017,640 | + 006,762 ;+°009,898 
013,607 | + 009,063 | + °019,524 
005,157 — ‘000,080 | + °005,034 | 
006,148 | + -003,162 + °002,582 
003,598 + :002,373 | + ‘009,013 
‘000,624 — -000,226 | + °000,153 
000,865 | + °002,976 |+ “005,442 
‘007,380 | + °002,829 | + 004,141 
008,956 | + :005,965 | + 012,851 
*009,026 | - 000,139 | + 
~ ‘007,882 | -- 004,053 | — 
“000,031 | — °000,020 | — 
005,043 | — 001,829 | + ‘001,237 
000,395 | — 001,359 | — 002,485 


i+ 001,2 270) — 003,679 (+ °004,557 | + 





+ 001, 301 | 



































KARL PkARSON AND Eaon S. PEARSON © 139 


Applying these tests we find : 
See (357. 3v T)= 1°000,000,  S,y (9,7, Sv 7,') = + 000,001, 
Swe (9, T29¢ T!)=+ "000,001, Sy (9p 759 TY) = + 000,008, 
Soy (9574 3¢ 74) =— 000,002,  Syy (9,753 7; ) = + 000,001, 
and Soy (35 Ts Sy Ts ) = + 000,002, 
results as close as we should expect, when we take into account the fact that our 
$7’s were only to five figure accuracy, and our products to six. 
The meaning of Table III should be quite intelligible ; namely, for example : 
084 = = 079,464 + 001,947 r — 002,229 7° 
+ -001,5387° — 014,745 r+ + -000,8977° — -014,8847° +4... ...(xxi) 
is the equation which will give the correlation coefficient r as deduced from the 
(3, 2) cell. If r be given any other value the right hand of the above expression 
is equal to the contents of the (3, 2) cell for a normal correlation surface of corre- 
lation coefficient 7 having the observed marginal totals. 

Thus far the arithmetic is absolutely comparable with that needed for Ritchic- 
Scott’s “polychoric rv.” We should have to solve the 49 equations, and then 
calculate—the stiffest part of the work—the probable errors of the 49 correlation 
coefficients which are the roots of these equations. Using these probable errors as 
our weighting data, we should find a mean coefficient. Our purpose is to replace 
the weighting and the solution of the 49 equations by the solution of a single 
equation. It will be noticed that both Ritchie-Scott’s and our methods have an 
undesirable limitation, for we both assume the marginal totals to be those of the 
normal correlation surface. Actually in our case we ought to treat the marginal 
totals as unknown, or select h,, he, h;, ... hy, ki, he, ky, ... ky as well as r to give as 
closely as possible the observed frequencies. Now the 7’s and consequently the 
T’s and $7’s and $7’s all depend upon the /’s and k’s and the equations obtained 
by making 

Sys (=) = minimum 
Tiss’ 
do not appear to lend themselves to any reasonably brief system of solutions. We 
were compelled therefore to introduce the admittedly limited form of solution, Le. the 
determination of the best normal correlation surface subject to the restriction of 
its having the same marginal totals as the observed frequency surface. We con- 
sider this a practically necessary but none the less grave restriction. 

We next proceeded to determine the value of fiy/ny and (My/ny)* for certain 
selected values of 7 in order to build up equation (xvii) and solve it by inter- 
polation. The values chosen were: 0°45, 0°50 and 0°55. These cover the range 
within which we anticipate the solution of (xvii) for 7 will lie. We need also the 
value of the numerator in (xvii), Le. 

Voy = Yq T Ie Ty + WWYz_ Te By Te + 27°39, 7399 7; +...; 


for the same three values of 7. These results are given in Table IV. 





140 On Polychorice Coefficients of Correlation 


TABLE IV. Valwes of (Figs), (Figg /Mey)? and Vyy. 


















































8 Function =A s=1 | s=2 | s=3 s=4 s=5 | s=6 s=7 
Figg’ |[Nas! (a) | 1°602,750 750 ‘879,954 | °814,714 Pa ‘388, 000 | x oo 
Olives! Seme| ee] fee 
é) | 2°115,5' 3 |} 87,8907 6) ‘li 2 sa) 
(Tigy'/gg’)® (4) | 2°568,806 774,321 | -663,759 «© "150,544 | a0 re 
(b) | 3°407,716 | -813,604 | 497,830 «© 070,756 | © x 
(¢) | 4°475,340 | 839, 805 | *345,576 © "030,276 © ea 
Ves (@) | +°018,462 |+°011,177 | — 014,603 | — 009,218 | - 002, 733 | — 002,747 | — 000,336 
(b) +020, 480 |+ 008, 113 | — 015,910 | — -008,382 | — 002,154 | — — 001 929 — ‘000,218 
(c) |+ 022,694 |+ -004,477 | — 017,013 | — 007,243 |— -001,519 | - -001 "228 — 000,168 
— _ —{— |—_——__—_——__ | —| ——$ 
igs’ /sy  () 867,348 | “905,539 | 944,250 | 1°478,500 | 1°379,000 | 1°887,333 © 
(b) | *894,565| -944,630 | 939,833 | 1°383,615 | 1-215,375 | 1°544,667 00 
(c)| 915,130} -986,935 | -933,333 | 1-278,500 | 1°043,500 | 1°213,333 | © 
| | (Aey'/Mey)® (4) | *752,293 | 820,001 | *891,608 | 2°185,962 | 1-901,641 | 3°562,026 | oo 
| 2 (b)| *800,247| 892,326 | °883,286 | 1°914,390 | 1°477,136 | 2°385,996 ra 
| (c) | *837,463| -974,041| -871,110| 1-634,562 | 1-088,892 | 1°472,177 | © 
Ver — (@) | + 018,846 | + °116,000 | — 005,963 | - 046,943 | - -025,552 | — 041,623 | — -009,764 
(&) |+ 011,084 | +°125,051 | — -009,011 | — 051,854 | — 026,828 | — -040;529 | — -007, 913 
| (¢) | + 007,767 | +-135,874 | - -013,006 | -- -057,659 | — -028,171 | — 038,864 | — -005 "940 
TisyMgy (4) | 857,750 | 1°075,552 | 1°108,280 812,500 | -854,818 | 984,375 Ba 194,000 
| (4) | °751,875 | 1°076,195 | 1°138,747 | 823,410 -844,773 | “930,333 | 1°846,500 
| (¢)| *635,750 | 1-075,448 | 1°175,027 | -835,909 | -831,636 866,000 | 1°486,500 
| (Rgy/Nsy)* (4) | °735,735 | 1°156,812 | 1°228,285 | -660,156 730,714) “968,994 | 4°813,636 
| 3 (6) 565,316 1°158,196 | 1°296,745 | -678,004 -713,641 | 865,519 | 3°409,562 
| (¢)| 404,178 | 1°156,588 | 1°380,688 | -698,744| -691,618  *749,956 | 2-209,682 
Vs (4) | — 016,095 | + 002,075 | + 041,770 | + -013,402 | — -003,847 | — 023,749 | — 013,555 
(b) | — ‘017,786 | + 000,037 | + °049,827 | + -015,435 | — 005,038 — ‘028,268 | — -014,205 
| (¢) | — ‘019,311 | — 002,805 | + 059,278 | + -017,601 | — 006,601 | — °033,522 | - -014,539 
Tigy'/Ngy (@) | 1°634,000 | 1°155,897 | 1°078,222 | -808,892 | -850,571 | 1:225,786| -671,833 
(b) | 1+260,000 | 1 096,966 1°098,417 | 837,135 ‘877,714 | 1°239,929| -627,333 
(¢) | *917,000 | 1-030,862 | 1°121,944 869,568 — -907,429 | 1°250,000| -570,000 
(7igs'/Mog)* (4) | 2°669,956 | 1°336,098 | 1°162,563 -654,306 | -723,471 | 1°502,551 | 451,360 
| 4 (b) | 1-587, 600 | 1-203,334 | 1°206,520 | -700,795| -770,382 | 1°537,424| -393,547 
| (e) | 840, 889 | 1-062,676 | 1°258,758 | -756,149 | -823,427 | 1°562,500| -324,900 
Vey (a) \- 007, 715 | - 032,319 | + 013,434 + 019,505 | + -007,261 | + 004,459 | — 004,625 
(6) | — 007,215 | — 036,132 + °015,696 | + -022,370 | + 007,947 | + :003,430 |-- 006,095 
(c) |= -006,471 nk ‘040,720 | + 018,237 | + -025,717 | + °008,735 + 002,185 — ‘007,680 
Nigg'/Mey () a | 1°114,27 8 | 1-028,556 934,423 | -960,545 | “935,111 946,600 
(b) 9 a 006, 167 | 1°027,185 | 969,000 | 1:008,273) ‘978,044 -944,400 
(c) 2 890, 389 | 1°024,148 | 1:007,692 | 1-061,727 | 1°025,1:1 | 927,400 
(Tgg'/ Mee)” (a) © 241,615 | 1°057,927 | -873,146! 922,647 °874,433 | "896,052 
5 (d) x 1012/3723 1:055,109 | -938,961 | 1°016,614| 9 58,331 | ‘891,891 
(c) 10 | *792,793 | 1°048,879 | 1:015,443 | 1-127,264 | 1-050 853 | °860,071 
Ves ~— (@) |— 004,797 | — 037,653 | — 000,381 | + -017,025 | + 009,967 | + 015,398 | | + 000,440 
() | — 003,957 | — -040,252 | — -001,134 | + -018,995 | + -011,095 +016, 166 | — 000,916 
| (ce) |— 002,988 IC 043,158 | — *002,230 | + 021,305 | + 012,465 (+°017,113 3 | — “002,508 | 
| | Ngy/ Nyy (cL) © 1°461,333 °867,077 | 1:216,474 | 1°604,286 | -701 931 | 906,500 
(b) 2 1°224,000 °830,576 | 1°245,526 | 1°693,714 | °754,966 | -969,125 
(¢) 0 ‘988,111 | °786,077 | 1:273,789 | 1°791,000 *812,828 1°028,375 | 
Tigy| Nyy)? (a | wo 2135,494 | 751,823 | 1-479,809 | 2°573,734 -492,707 | “821,742 | 
6 (b) x 1°498,176  °689,856 | 1°551,335 | 2- 868, 667 | 569,974 | -939,203 
| (ec) oo ‘976,363 | 617,917 | 1-622,538 | 3-207,681 | -660,689 | 1 ‘057, 556 | | 
Vs (@) | — 003,069 | — 042,728 | — ‘017,170 | +-011,165 | + °012,060 | + -029,542 | + O10, 202 
(6) | - 002,189 |— -042,658 — 020,931 | + -010,905 | + 013,028 | + 032,069 | + -V09, 778 | 
(ce) |- 001,381 — °042,197 | — 025,479 | + 010,572 | + °014,219 + 035, 116 ; 009,152 | 
Ngy'[Ngy (a 0 ‘961,333 *747,556 | 1°462,667 | -845,000 1: 140,500 870,286 
(b) a °705,000 | 648,556 | 1°411,167 | *865,000 | 1°235,000 | 1-003,143 | 
(ec) 00 ‘491,000 | “542,000 1-337,333 *877,000 1-331,000 1°150,286 
(agg? /idgg’)” (4) wo | 924,161 | 558,840 | 2°139,395 | -714,025 | 1°300,740| °757,398 | 
7 (d) | x | *497,025 | *420,625 | 1°991,392 | -748,225 | 1°525,225 | 1-006,296 | 
| (c)|  @ 241,081 | 293,764 | 1-788,460 | 769,129 | 1°771,561 | 1-323,158 | 
| Vey (@) | — ‘000,633 | — 016,551 | — 017,086 | — 004,935 | + 002,845 |+-018, 719 | + 017,641 
| (b) | — 000,417 | — 014,160 | — 018,536 | — -007,468 | + -001,950 | 4-019, 060 |+°019,571 | 
| (c) | — -000,309 — ‘011,471 | - 019,787 |— + 000,873 |+°019,302 + 021,685 | 








010,293 








The values (a), (6), (c) refer respectively to ,=0°45, 0°50, 0°55. 














KarL PEARSON AND Econ S. PEARSON 


141 


Having obtained (fisy/n,)? and vy for the trial values of r, it is only a matter 
of adding 5 /(7isy/Nsy)? for all values of s and s’ on the machine in order to obtain: 


U= Sey {Vss'/(Tisy/Nse)*}- 


The values obtained were: 


r= 


u 


0°45 


+°157,074 + 012, 


| 0°50 


2,2 


"D5 


— 209,976 


Whence by inverse interpolation * we find: 
u=0 for r, = 5034, 
as based upon Equation (xvii). We shall compare later 
value for r as found by other processce. But the above value is clearly well 
in accord with the usual result for paternal correlation in man, 
Table V gives the working values of ry/(Tigy/Nsy 


TABLE V. 


which is “ polychorie 7” 
the 


























Values of vgs/ (iis /? Naw’)? T 








“4 a 
| | 
ee 7 s=1 3s=2 s=3 s=:4 s=5 | s=6 | s=7 
| | | 
et eee : & £ a | |__| L 
(a) \+°007,186 +°014,435 | — 022,001 0 ~ 018,159 | 0 0 
1 | (b) |+ 006,010 | + -009,972 | — -031,958 0 — 030,424 0 | 0 
(ce) |+°005,071 + °005,331 | — -049,232 | O — 050, 132 O | ) 
a = 5 ee | 
| (a) |+°018,405 |+°141,463 | - 006,688 |— 021,475 | — ‘013,436 | —-011,685 O 
2 | (b) |+:013,851 4°140,141 — -010,202 | — -027,086 | — 018,162 |—-016,986 | 0 
(ec) \+ 009,274 |+ 139,495 — ‘014,930 | — 035,275 | — 025,871 | — 026,399 | 0 
| (a) |—-021,876 |+-001,794 4 034,007 | + ‘020,301 | — 005,265 ~ -024,! 500 | — 002,816 
3 | (b) |—-031,462 | + -000,032 | + -038,425 + 022,755 | —-007,060 —-032,660 | — -004,166 
| (ce) |- 047,779 — 002,425 | + 042,934 i+ 025,189 | — ‘009,544 | - ‘O44, 832 F 006, 579 
| (a) | ~-002,890 | —-024,189 | + -011,556 | + -029,810 | + 010,036 | +-002, 968 | — 010,247 
t)- @ I 004, 543 | — 030,027 | + 013,009 | + 031,921 | + 010,316 +002, 231 \- 015,487 
(e) 007,695 | — °038,318 + °014,488 | + 034,010 | + °010,608 \-+ 001 398 & 023,039 
| | : a 
| (a) 0 — 030,326 | — -000,360 | + ‘019,498 | — °010,803 | + -017 3610 + “000,491 
& +} @®) i) — ‘039,760 |— -001,075 | + °020,230 gE 010, 914 |+ 016, 869 | — -001,027 
| (e) O — 054,439 | — 002,126 | + °020,981 |+°011,058 +-016, 285 |— -002,916 
| (ce) 4,439 | ; 981 |+-011, : 
| (a) () — 020,008 | — -022,838 | + 007,545 |+-004,686 | + °059,958 |+ 012,415 
6 | (b) 0 ~ 028,474 |— -030,341 + -007,029 | + -004,542 | 4+ -056,264 |+-010,411 | 
(ec) O - 043,219 |— -041,234 + °006,516 | + °004,433 | + °053,151 | + -008,654 | 
\ 5) > ] ’ > | 
som | : et 
(a) 0 — 017,910 = -030,574 | — 002,307 | + “003,984 | + °014,391 | + 023,291 | 
7 (b) 0 — ‘028,491 |— 044,067 | — -003,750 | +°002,606 | + °012,497 | + -019,449 | 
| (c) 0 — 047,576 | - 067,856 - 005,755 | + 001,135 | + 010,895 | + 016,389 
S (a) = +°157,074, S(b)= +01: 2,276,  S(c)= —:209,976. 
* The formula used was Casus I or zg= 2) + $0 (Az_; + Az) + 4075229, the solution of the quadratic 


giving 0. 





The table suggests, a posteriori, that we should have got quite reasonable results from linear inter- 
o , 


polation ; we have 
as against our 


5034, 


: from (a) and (b) r= 


5042 ; from (a) and (c) r=*4928, and from (bd) and (c) r= 
It should be noticed that the values in Table V are not always in agreement in 


5025, 


the last figure with those obtained by dividing v,, in Table IV by the (igg-/nsy)? of that table, because the 


somewhat more accurate process was adopted of multiplying : 
the physical meanings of figy/Mgy and (Ngy// Nye’) 


to register their values. 


« by n*,. and then dividing by n°... 
2 are so promine nt in the work that it seemed desirable 


Still 








142 On Polychorie Coefficients of Correlation 


Before we consider the graph due to this solution, let us investigate the value 
of r to be found from (xvi). The values of jisy/nsy are already provided in 
Table IV, but we need a table corresponding to Table III giving the product 
3; TST,’ instead of the product %,7,9,7,’. This is provided in Table VI. 
Further if 

Key = %— Ty) Sy Ty +735 Ti Se Ty + 2°93, T23¢ Ty + «..; 


Table VII (p. 143) provides «,. for the same three values of 7, i.e. 0°45, 0°50 and 


0°55. Finally Table VIII (p. 143) gives kg,/(igy/Ngy), Whence by summing we obtain 
, ae = See {iax'/(Tigg:/Mss)t, 
for the three cases. 

Using the same interpolation formula as before in order to discover the value 
of r for which v = 0 we find: 

r = '5204. 

There is thus a difference of (0170 between the two methods. The probable 
error found for the product-moment 7 is ‘0160 and the result by the usual product- 
moment process may be given: 

r='5189 + 0160. 

Thus either of the values. reached by the methods of this paper differ by less 
than the probable error from the true product-moment value. 

(4) If we work out the results by mean square contingency we find: 

C!, = -480,690, 
and the class index correlations are*: 
= 962,329. 
964,523. 


rl > 
For fathers : Te 


For sons: 7, 


's 

Hence correlation from mean square contingency 

r= C2/(TepV es) = 5179, 
which is in excellent agreement with the product-moment value. 

It would therefore be quite reasonable for such a table as the present to use 
mean square contingency and class index corrections, and save the heavy labour 
of Equation (xvi bis) or (xvii) At the same time we cannot assert that this 
process would always be equally satisfactory for tables with but few broad 
categories and with much higher correlation. 

Our two processes seem to give values slightly in defect and in excess of the 
true value of 7, and we might use their mean, i.e. 5118, to obtain our graph. We 
shall, however, first proceed to compare the actual results of solving (xiv) and 
substituting in (xv) with the result of such approximative processes. 

Table IX (p. 145) gives the products of $, 7,3, 7,’ and will therefore enable us 
by aid of Table IV (p. 140) which gives the values of figg/ngy to obtain h,. for any 
value of r. Let 

Ney = Sy Ty Se Te +e Ly Se Te HI De Ve Fa Te HK ace cvceccves (xxii). 


* Using the values of 7, and y, in Tables 1 and LI respectively. 

































































Kart Prarson and Econ S. PEarson 143 
TABLE VI. 
Values of 3,7, Sy T;’. 
| s=1 s=2 | s=3 s=4 | s=5 | s=6 3s=7 Pp 3’ 
| 
+ *005,966 | + 022,207 + °000,509 | — 007,686 '— 005,120 !— -011,028 |—-004,848 | 0 
| + °030,608 | + 054,185 | + -001,722 | + -010,633 |4+°011,536 + °037,126 |+ °025,890 
| + °054,736 | + ‘014,342 | — 000,976 |+ 010,114 | + 000,145 | — -029,528 |— 048,833 
| + 035,199 | — :033,825 |— 002,246 | — ‘012,136 | - (010,270 , — -010,999 | + °034,276 
| + °002,562 | — 007,587 | + ‘000,169 |~ "000,915 | + °001,407 | + 007,853 — *003,489 | 
| + °005,180 | — 004,622 | — ‘000,915 | — -004,289 | — 002,583 | + °0N2,272 | 1 -005,590 | 
+ 007,003 | — 011,727 | + 000,212 | — -000,013 | + °003,287 | + 008,380 | — “007,142 | 
; es SSS a 3 : s {|__| —— 
| + 022,842 | + 085,018 \+ *001 ,948 Ps 029,426 | — °019,601 | — 042,222 | — 018,559 
+ 056,787 | + °100,528 | + 003,195 | + -019,728 + 021,402 | + °068,878 | + 048,033 
| + 017,375 | + °004,553 | — ‘000,310 | + 003,210 +4 -000,046 | — -009,373 — 015,501 | 
| — ‘033,035 | + °031,745 |+ "002,108 | + -011,389 | + 009,639 | + -010,323 | — ‘032,169 2 
— (007,358 + 021,786 | — 000,486 | + 002,627 — 004,040 | — 022,548 | + °010,019 | 
| — 004,261 | + 003,802 | + 000,753 |+ 003,528 | + ‘002,124 | — -002,774 |— 003,172 
be ‘011,913 | + °019,949 | — 000,362 | + 000,022 | -- 005,591 | — 014,255 | + 012,150 
= le | es eae 
| + -001,324 | +-004,929 | + -000,113 | — 001,706 | — -001,136 | — 002,448 | — 001,076 
| + 002,378 +°004,211 | + -000,134 | + -000,826 | + -000,896 | + -002,885 |+ °002,012 
| — 002,423 | — 000,635 | + 000,043 | — -000,448 | — -000,006 | + -001,307 |+ 002,162 
3 | — 002,976 | + 002,860 | + 000,190 | + -001,026 | + 000,868 | + 000,930 — 002,898 3 
+ °000,365 | — ‘001,080 | + *000,024 | — -000,130 + 000,200 | + 001,118 | — ‘000,497 
— ‘001,271 |+°001,134 | + 090,225 | +-001,052 | + -000,634 — 000,827 | — 000,946 
+ 000,475 i 000,796 |+ °000,014 i— 000,001 | + 000,223 | + -000,569 | — 000,485 
| | 
— :005,329 | — 019,833 | — -000,455 | + “006,865 + 004,573 | 4+°009,850 | + °004,330 
+ (006,217 | + °011,006 | + -000,350 | + -002,160 | + ‘002,343 | + °007,541 |+ °005,259 
+ 008,127 |+ 002,130 |— 000,145 | + -001,502 | + 000,022 | — 004,385 | — 007,251 
— 007,217 | + 006,935 |+ 000,461 | + -002,488 | + -002,106 + °002,255 | — 007,028 
— :000,941 | + 002,786 | — -000,062 | + -000,336 | — :000,517 — °002,883 |+ °001,281 
— 002,847 | + °002,540 | + -000,503 | + -002,358 | + -001,419 — ‘001,854 | — 002,120 
- 000,780 | + °001,306 | — 000,024 | + 000,001 | — °000,366 | — *000,934 + °000,796 
us cel | = | oa 2 — 
0 |—:007,289 —~ -027,130 | — -000,622 | 4+ 009,390 |+°006,255 4°013,473 | + ‘005,923 
1 |+°014,662 | + -025,956 | + ‘000,825 | + 005,094 | + 005,526 + °017,784 |+ ‘012,402 
2 | +°002,953 | + 000,774 | — 000,053 | + -000,546 | + 000,008 | — -001,593 | — 002,635 
3 | - 013,673 | + 013,139 | + 000,873 | + -004,714 | + 003,990 | + 004,273 | — 013,315 
4 |+°001,057 | — 003,128 | + 000,070 | — -000,377 | + ‘000,580 | + -003,238 | — 001,439 
5 | — 004,116 | + 003,672 |+°000,727 | + 003,408 + -002,052 | — 002,679 | — 003,064 
6 | + 003,320 | — 005,559 | + 000,101 | — 000,006 | + °001,558 | + 003,973 — ‘003,386 | 
| — ~ a | 
— 010,563 — °039,314 | — 000,901 | + 013,607 | + 009,064 + °019,524 | + °008,582 | 
| + 033,046 | + °058,500 | + ‘001,860 | + ‘011,480 | + °012,454 + 040,082 |+°027,952) 1 | 
| — 021,493 | — -005,632 | + ‘000,383 |— -003,971 | — *000,057 + °011,595 |+°019,175 | 2 | 
— ‘013,795 | +°013,256 | + -000,880 | + -004,756 | +°004,925 + 004,311 |— "013,433 | 3. | 
+ 006,142 | — 018,186 | + “000,406 | — 002,193 | + 003,372 + ‘018,822 |— -008,364) 4 | 
+ °001,134 | — 001,011 | — -000,200 | — 000,939 | — -000,565 + -000,738 |+ 000,844 | 5 | 
|+ -008,563 | - 014,340 + 000,260 | - 000,016 + 004,019 +4 010,247 |— 008,733 | 6 | 
ieiies | | — —|———_| 
| — 006,952 | — ‘025,876 — 000,593 | + 008,956 + 005,966 + °012,851 |+ 005,649 | 0 
| + 034,867 | + °061,193 | + 001,945 | + -012,009 | + 013,028 | + 041,927 | + °029,238) 1 | 
| — 059,276 | — 015,532 | + 001,057 | - -010,953 | — ‘000,157 | + 031,978 | +°052,883 | 2 | 
| + 035,498 | — ‘034,111 | — 002,265 | ~ 012,238 | — 010,357 — 011,092 | + ‘034,567 | 3 7 
— 001,827 | + 005,409 | — 000,121 + sere | 001,003 | — 005,599 |+°002,488 | 4 
+ 006,181 | — 005,515 | — “001,092 | — 005,118 | — 003,082 | + 004,024 |+ °004,602 | 5 
— 006,668 | + "011,167 | — 000,202 |+ 000,202 — 003,130 — -007,980 + 006,801 | 6 






























































144 On Polychoric Coefficients of Correlation 
TABLE VII. Values of ky. 
| s' r | s=1 s=2 | s=3 | s=4 s=5 s=6 s=7 
| (a) |+034,290 | + -045,918 |+-000,873 —-002,076 ~ -000, 798 — 000,849 — -000,094 
1] (8) |4:039;786 + -0472855 + 000,831 —-001,549 |—-000,541 — 000,496 | — -000,036 
| | (c) |+°045,903 + 049,468 + ‘000,762 | — 001,097 — -000,350 —-000,251 — 000,001 | 
| Satie tipes Wied ea pee 
| | (a) |-++048,425 | 4+135,200 + “003,506 | - 018,688 | — 009,255 — 013,278 — -002,561 
2 () i+ 050,671 +°142,180 +:003,719 |—°017,061 |—-007,957 — 010,554 |— -001,722 
| | It +052,617 |+°149,704 +-003,946 — 015,291 |—-006,630 —-008,054 — -001,089 
| —e Sacenaeane . 
re (a) +0 -001,628 |-+ 006,926 + 000,205 — 001,317 |— 000,633 —-000,765 — 000,039 
| (b) |+ 001,526 | + 007,188 + *000,223 | — 001,252 | - 000,545 —-000,509 + -000,040 
| (ce) ++ 001,386 |-+-0072465 | + -000,245 |~-001,175 | — “000,444 — -000,235 + -000,096 
} — — ——_— = ———— _—— —__—— 
(a) |—-001,641 |—-vis,045 ~ 000,278 | + °008,425 | + °005,826 +°012,401 | + -004,608 
| 4 (b) | — 001,250 | — -012,657 | — 000,247 | + °008,726 + -006,019 4 °012,553 + -004,294 | 
| (ec) |—*000,903 | —°011,563 — 000,211 |+ °009,071 | + °006,233 +012, 663 + 003,892 | 
| — a eS — 
| (a) |— 001,344 | 014,202 | ~ .000,16d |-+ 012,270 | +-009,181 +021, 659 + -009,613 | 
5 (b) |=-000;940 | — -012;484 — 000,085 + -012,745 + -009,643 + -022,682 + -009,562 
| (ec) |—+000,625 | — 010,689 + 000,007 |+°013,278 |+°010,160 + 023,755 | + “009,352 
(a) | ~-000,958 |—-013,805 | +-000,109 |+-018,295 |+-015,185 +-041,172 | + -023,419 
6  (b) 000,584 | - -011,207 +-000,258 | + 018,782 |+ 016,036 + °044,362 + -025,040 
(e) | —-000,328 | — -008,749 + -000,419 | + -019,263 | +-016,957 + °047,837 + -026,557 
(a) |—*000,047 — -004,380 | + -000,263 |+°010,961 +-010,729 +-036,961 + 032,908 
7 | (b) | --000,076 —-003,086 | +-000,316 +-010,574 +-010,938 +-040,073 4+ -038,215 
(ce) |4000,159 —-002,067 |-+-000,348 4:010,019 +°011,027 +4 043,208 +. -044,126 | 
TABLE VIIL Values of Ksy/(Tisy/ nse’). 
| s’ r | s=1 s=2 s=3 e=4 | =G | s=6 o=7 | 
| we — 
(a) |+°021,394 |+-052,182 +:001,072} 0 — 002,057 0 o | 
| 1 | (6) | +°021,552 |+-053,054 + -001,178 | 0 — 002,033 | 0 0 | 
(ec) |+°021,698 | + 053,980 + -001,296 | 0 — 022,011 | 0 0 | 
(a) +°055,831 +°149,303 + 003,713 | — 012,640 |—-006,711 | — 007,035 0. (| 
2 (b) +°056,643 +°150,514 + 003,957 | —-012,331 | — 006,547 | — 006,833 o | 
(e) +°057,497 +°151,686 + -004,228 | — 011,960 | — 006,354 | — 006,638 24 
&: | 
(a) |+°001,898 + 006,439 + -000,185 | - 001,621 |— -000,741 |— -000,777 — -000, 018 | 
3 (b) | +002,029 +-006,679 + -000,196 | — ‘001,521 |— 000,645 | — 000,547 + -000,022 | 
(e) | +°002,180 + 006,941 + 000,209 | — 001,406 | — 000,534 | — 000,271 + -000,065 
— — | | 
(a) |—°001,004 --011,805 — -000,258 |+-010,415 + °006,850 |+ 010,117 + 006,859 
1 (b) | ~ 000,992 — -011,538 | — ‘000,225 | + 010,424 | + 006,858 |+°010,124 +-006,845 
(ec) |— 000,985 — -011,217 |— 000,188 | + 010,432 + 006,869 | + 010,130 + -006,828 | 
(a) | 0 — ‘012,745 | — 000,160 |+ 013,131 + -009,558 |+ -023,162 + -010,155 | 
5 | (b) | 0 — 012,407 —-000,083 | + -013,153 + -009,564 |+ ‘023,170 + -010,125 | 
(ec) | 0 | 018,006 — 000,007 |+ °013,177 | + "009,569 | + 023,173 + 010,084 | 
= | | = a 
(a) 0 — 009,447 + 000,126 |4 015,039 + (009,465 |+-058,655 + -025,835 
6 | (b) 0 —-009,156 +-000,311 |+°015,079 + -009,468 | + -058,760 + -025,838 | 
(e) 0 = -008,854 + 000,533 + °015,123 + 009,468 |+-058,853 + -025,824 | 
. | = : | 
(a) 0 |= 004,556 + 000,352 |+ 007,494 + 012,697 | + 032,408 |+ -037,813 
7 | (6) | © — |~*004,377 + 000,487 | + “007,493 + °012,645 | + 032,448 | + -038,095 
(c) | 0 - 004,210 + ‘000,642 | + = 92 +°012,574 | + 032,463 + °038,361 
. ro Paha = See Ss ee | 
S(a)=+ +510, 573, S@=+ 517,476, S (c)= + 524,735, 
Vg = -- 060,573, p= — 017,476, Ve= + 025,265. 








Karu PEARSON AND Eaon S. PEARSON 145 


TABLE IX. 
Values of 357) Sy Ty’. 




































































s’ | p s=1 s=2 | s=3 s=4 | s=5 s=6 s=7 p s | 
| 0 | —:002,689 | — -010,007 |—-000,229 + -003,464 + -002,307 +-004,970 |+ -002, 185] 0 | 
| 1 |—°013,450 | — -023,810 | — -000,757 | — 004,673 | — 005,069 |—-016,314 |—-011 377 1 
| 2 | —:023,067 | — -006,044 |+-C00,411 | — -004,262 |— 000,061 |+°012,444 |+°020,579 | 2 
Ly 2 - 013,496 + 012,969 + 000, 861 |+ 004,653 + 003,938 + 004,217 j— 013,142 | 3 1 
| 4 —°000,450 + -001, 333 | — 000,030 | + 000,161 —-000,247 — 001,380 |+-000,613 | 4 
| 5 |—-003,007 + 002,683 | + °000,531 | + 002,490 + °001,499 | — -001,958 /— 002,239 5 
| 6 | — 003,082 | + -005,161 | — 000,094 +°000,006 —°001,446 |— ‘003,688 |+ 003,143 | 6 
| 
tf O |—°023,802 — -088,; 590 | - 002,030 | + *030,662 +020, 425 + 043,996 + 019,339] 0 
| 1 |— 051,494 — -091,158 | — -002,898.| -- -017,889 — 019,407 —-062,458 |—°043,556 | 1 
| 2 |—-002,940 ~ -000.770 + 000,052 | — 000,543 — 000,008 |+ 001,586 |+ 002,623 | 2 
2 | 3 | +:°036,379 — :034,958 | — 002,322 | — -012,542 ~ ‘010,614 | — -011,368 +°035,425| 3 | 2 
| 4 (+°004,781 | —-014,154 |+-000,316 | — 001,707 + °002,625 |+-014,650 —-006,510} 4 | 
| 5 +°007,797 | -- 006,957 | — ‘001,378 | — 006,456 | — °003,887 | + 005,076 | 4 005,805 5 
6 +:°009,448 — -015,822 |+ 000,287 | — "000,017 + 004,434 +:°011,306 —-009.636| 6 | 
| 
) ~ 022,45 58 | — ‘083,587 | — 001,915 | + 028,931 |+°019,271 |4+ 041,511 |+°018,247| O | 
1 |—-002,986 | — -005,286 | — 000,168 | — 001,037 |— °001,125 — -003,622 |—-002,526] 1 | 
2  +:'045,338 |+°011,880 | — 000,808 | + ‘008,377 + °000,120 — 024,459 |— 040,449 | 2 
3 3 +:°003,681 | — °003,537 | — -000,235 | — -001,269 —°001,074 —-001,150 +°003,584 | 3 3 
4 — 007,651 | + °022,655 | — -000,506 | + 002,732 |— -004,201 — 023,447 |+°010,419 | 4 
5 | +°001,548 | — -001,381 | — 000,273 | — -001,281 |— 000,772 +:°001,007 |+-°001,152| 5 
6 |—"011,412 |+°019,111 | — 000,346 + 000,021 |—-005,356 — 013,656 +°011,639| 6 
0 —*010,833 | — 040,322 |— -000,924 | + °013,956 |+ °009,296 + -020,025 |+ -008,802] 0 
1 | +°012,013 | + -021,267 | + ‘000,676 | + 004,173 |+ 004,528 +°014,571 |+°010,161 | 1 
2 |+°017,109 |+°004,483 | —-000,305 | + -003,161 |+°000,045 | — -009,230 | — -015,264 2 
4 3  —°014,068 |+ °013,518 | + °000,898 | + 004,850 + 004,105 + -004,396 |— -013,699 | 3 4 
4 |— 002,101 |+ 006,221 | — 000,139 | + -°000,750 |— 001,154 —-006,439 |+-002,861 | 4 
5 —*005,604 | + ‘005,000 | + -000,990 | + 004,640 |+ 002,794 —-003,648 ,— 004,172} 5 
6 — 001,988 | + °003,329 | — -000,060 | + 000,004 | — (000,933 — 002,379 |+ 002,028 | 6 
x a = eer | ——__—____, —————— rae kee ot 
0 |—°008,303 | — 080,904 | — -000,708 |+ 010,696 +°007,125 + 015,347 |+°006,746 | 0 
1 +°016,433 | + 029,090 | + °000,925 |+ 005,709 |+ °006,193 | + ‘019,931 |+°013,899} 1 | 
2 +7003,809 | + °000,998 | — 000,068 | + -000,704 |+ 000,010 — 002,055 |— 003,399 | 2 | | 
5 3 — 015,502 |+ °014,897 |+ 000,989 | + 005,344 |+ "004,523 + 004,844 |— 015,096} 3 | 5 | 
4 +°001,087 | — °003,220 | + -000,072 | — 000,388 | + 000,597 | + °003,332 |— °001,481 4 | 
5 (—*004,744 |+ 004,233 + 000,838 |+ "003,928 + °002,365 | — 003,088 |— °003,532 5 | 
6 | +:003,592 | -- °006,016 | + 000,109 | — 000,007 |+ °001,686 | + :004,299 |— 008,064 6 
0 | —*007,749 | — 028,843 | — 000,661 | + °009,983 |+ °006,650 + °014,324 I+: 006,297 | 0 
1 |+:023,811 | + °042,152 |+ :001,340 |+ 008,272 +°008,974 | + °028,881 |+°020,140; 1 
2 —'014,636 | — :003,835 | + -000,261 | — -002,704 '— 000,039 |+ °007, 896 -- 013,057 | 2 | 
6 3  —:010,657 +°010,241 | +-000,680 | + 003,674 | +°003,110 + 003.330 |— 010,378 | 3 6 
| 4 |4+:004,372 —-012,946 |+-000,289 |—-001,561 ++ 002,401 + °013,399 |— 005,954} 4 | 
5 +:°000,442 |—-000,394 | — 000,078 | — 000,366 — -000,220 | + °000,288 | + °000,329 5 | 
| 6 +:°006,335 — °010,608 + 000,192 |—:000,012 + °002,973 Sa eee | ee 6 | 
| 0 — 003,242 | —-:012,067 | — 000,276 +:°004,177 +:°002,782 + °005,993 |+ -002,634 |} 0 
1 +°015,673 + -°027,746 |+-000,882 |4+°005,445 + °005,907 + °019,010 |4+°013,257} 1 | 
| 2 |—+025,614 — -006,712 |+°000,457 — 004,733 — -000,068 | + °013,818 | +°022,851] 2 | 
7 | 3 |+°013,663 |— -013,130 |—-000,872 |— 004,711 — -003,987 | -- 004,270 |+ °013,305 | 3 | oe 
| 4 |—-000,037 + :000,111 | — 700,002 + 000,013 — -000,020 —-000,114 |4+°000,051 | 4 | 
| 5 |+:003,569 |— 003,184 | — 000,631 — 002,955 |— -001,779 + °002,323 + °002,657| 5 | | 
| 6 |= 002,893 | +:004,845 [= ‘000,088 + 000,005 | —-001,358 — 003,462 + °002,951 | 6 | 
' 





Biometrika x1v 10 





146 On Polychoric Coefficients of Correlation 
TABLE X. 
Values of Sstp3wT'y’. 
























































3’ p s= s=2 | s=3 s=4 | s=5 s=6 | e=7 
| | 
| | 0 |— 002,716 |—-024,296 | — ‘019,919 | — 013,581 | — 005,206 | — ‘007,621 |= 002,113 
| — 013,578 | — -050,535 | — ‘001,157 + °017,491 | + °011,650 | + °025,097 | + °011,032 
2 — 023,244 + -001,049 | + 041,494 + °019,288 | + 000,298 — 018,826 | — 020,060 
ee | 3 "013,520 + 038,284 | + -001,489 | — 020,306 | — 010,442 | — -008,529 | + 013,024 
| | 4 —*000,364 | + -004,567 | — 007,904 |— 002,108 + °001,391 | + °005,282 |+ 000,864 
| 5 — 002,999 | + -008,296 | + 000,596 | — ‘007,320 — 002,654 |+ °001,795 | + ‘002,285 
| 6 |—:003,074 |+ 008,906 | — 011,032 | — -000,956 + "003,290 |-+ 006,016 | ~ 003,149 
| | 0 (—*010,399 — -093,014 | —+076:260 —-051,995 — ‘019,931 | 029,175 — ‘008,088 
1 — °025,191 | — ‘093,756 | -- 002,147 | + °032,451 | + 021,614 + 046,563 | + °020,467 
| | 2 |—'007,878 +-000,333 |+°013,171 + :°006,123 | + 000,095 | — ‘005,976 | — 006,367 
2 | 3  +°012,689 |—-035,930 | — 001,397 + °019,057 | + ‘009,800 | + 008,004 | — 012,223 
| 4 |+:001,044 | —-013,114 | + 022,696 | + -006,054 | -- 003,993 | — 015,167 | + °002,480 
| 5 |+:°002,467 —-006,824 | — 000,490 + ‘006,021 | + °002,183 | — ‘001,477 | — ‘001,880 
| | 6 |+°005,229 | —-015,149 |+°018,767 +:°001,627 | — ‘005,596 | — 010,234 |+ °005,357 
pee : ‘ | = : 
| 0 (—:000,603 |— 005,392 |—-004,421 — 003,014 — 001,155 | - 001,691 | — 000,469 
— ‘001,055 | — 003,927 — 000,090 + 001,359 | + °009,905 | + °001,950 | + ‘000,857 
2 |+°001,029 | — 000,046 | — ‘001,837 — ‘000,854 | — 000,013 + ‘000,833 | + ‘000,888 
3 3 |+ 001,143 | — -003,237 |— 000,126 +-°001,717 |+ 000,883 | + ‘000,721 |— °001,101 
= | ~ 000,052 + 000,650 |— ‘001,125 — 000,300 | + °000,198 | + ‘000,752 | — ‘000,123 
5 | +:°000,736 | — -002,035 |—-000,146 +°001,795 | + °000,651 | — 000,440 | — ‘000,561 
6 — 000,209 |+ -000,605 |— 000,749 | — -000,065 + 000,223 | + 000,408 — 000,214 
O | +°002,426 | + -021,699 |+ °017,790 | + °012,130 | + °004,650 | + ‘006,806 | + ‘001,887 
1 |— 002,758 | — 010,265 | — 000,235 + 003,553 | + ‘002,366 | + ‘005,098 | + °002,241 
2 |—'003,451 |+-000,156 | +-006,161 | + 002,864 |+ 000,044 | — 002,795 | — ‘002,979 
4 3 + °002,772 | — 007,849 | — 000,305 | + 004,163 | + °002,141 | +t — 002,670 
4 |+°000,134 | — -001,677 | + ‘002,902 | + °000,774 | — 000,511 | — 001,939 |+ ‘000,317 
5 |+ 001,648 | — 004,560 | — ‘000,328 | + 004,023 + 001,459 | — 000,987 | — 001,256 
6 | + 000,342 — 000,992 | + °001,229 + °000,107 Neesasxad Siocon + ‘000,351 
0 |+:003,318 |+ 029,682 | + 024,335 | + ‘016,592 | + :006,360 | + 009,310 | + 002,581 
1 |—°006,504 | — 024,207 — 000,554 + °008,379 | + 005,581 | + +012,022 | + +005 ,284 | 
2 |— 001,254 | +4 -000,057 | + 002,239 |+°001,041 |+ 000,016 | —-001,016 | — ‘001,082 | 
5 3 | +:005,252 — ‘014,872 | — 000,578 |+ °007,888 + 004,056 |+°003,313 — *005,059 
4 |-— 000,150 |+-001,883 | — ‘003,259 | — 000,869 | + 000,573 | + 002,178 | — ‘000,356 
5 |+°002,383 |—-006,592 — 000,474 |+°005,816 | + 002,109 | — 001,427 |- 001,816 | 
6 | —001,457 | + 004,222 |— -005,230 | — 000,453 | + °007 560 | + 002,852 | — 001,493 | 
| 0 +°004,809 + 043,011 +°035,264 024,044 009,217 | 013,491 | + °003,740 | 





+ 009,127 — -000,412 | -- 016,293 007,574 --°000,117 007 392 | + °007,876 | 


+ + 

1 + + 

2 - + 
6 | 3 |+°005,299 015,004 | — 000,583 007,958 + :°004,092 | + °003,342 | — ‘005,104 

4 + + 

5 - + 

+ + 


+" 

|— "014,659 | — 054,559 | — 001,249 + °018,884 012,578 | + 027,096 |+°011,910 
ok +° 

— 000,872 +:°010,946 |—°018,945 —*005,054 003,333 |+ °012,661 | — 002,070 | 

— 000,656 + -001,815 |+°000,130 | — 001,602 

+ 


f 000,581 ‘000,393 | + *000,500 
6 |-—°003,758 010,889 | — °013,490 | — ‘001,169 


-004,023 | + °007,356 | — 003,851 | 











003,165 + °028,310 | + 023,211 |+ °015,825 + °006,066 





Oo \+ + °008,880 + 002,462 | 
| l —'015,334 — -057,071 | — 001,307 + °019,753 + 013,157 | + °028,344 |+ 012,459 | 
| 2 + °025,172 — ‘001,136 | — 044,936 | — 020,888 | — 000,323 i+ 020,387 | + 021,724 | 
i 3 |—°013,635 + -038,608 |+ 001,501 — 020,478 | —°010,531 | — 008,601 | + °013,134 | 
+ 000,259 — 003,256 | + °005,635 | + °001,503 — ‘000,991 | — 003,766 + °000,616 
| | § | --*008,579 +:009,899 |+°000,711 | — °008,734 | — *003,167 | + *002,142 | + ‘002,727 
| 6 |+ + 000,911 | — °003,133 | - 005,729 | + ‘002,999 | 


‘002,927 — -008,480 + °010,505 





QuorkAwdeO© | 
NS) 


| 


~ 


ee el 
































Kari PEARSON AND Econ S. PEARSON 147 


We shall proceed to calculate »,, for three values of r which lie near the 
probable value of r as found from each column. We will take these as *45, ‘50 and 
‘55; from these values we shall obtain h,. for each column from (xiv) and inter- 
polating the real h,. between them find the corresponding columnar r, which will 
be then substituted in (xv) by aid of Table X to obtain the columnar mean f,.. 
Table XI gives the values of d,, for r=°45, ‘50 and ‘55, and Table XII the 
resulting values of hs. 


TABLE XI. Values of Ax for r=*45, 50 and *55. 





s’ r s=1 s=2 s=3 s=4 | s=5 | s=6 | s=7 

(a) |- 014, 742,01 | — ‘020, 616,58 | - —°000,400,18 + 000,974, 70 |+ 000,377, 97 | 000,409,54 4. *000,044,95 
l (b) |_. 017,038,00 | — ‘021,554,08 | — 000,383,88 | + °000,731,59 | |+ 000,258,31 | +°000,246,06 | + 000, 015,95 
(ce) |= 019,587,49 | — "022,37 3, 22 | ew 000, 356,40 | + 000,518, 95 + 000, 168, 60 |+ 000, 136,31 slat 000. 003,29 




















| (a) |—043,836,23 ~-133,792,74 — 003, 545,25 + °021,169,83 | +°010,795,76 | + 015,963,45 |+-003,258,21 

2 | (b) | ~-045,046,53 —*140,080,50 —~ -003,775,08 + 019, 705,30 +009,504,62 +012,993, 41 |+-002,268,84 

| (e) \—045,869,07 —-146,859,24 | — -004,026,98 | + 018,090, 53 | +°008, 150,14 | +°010,141,50 | + °001,500,21 
Se EEE! es —S—— |-- - 

(a) |--014 665,26 — 082,820,10 '~-002 204,29 | +080, 133,27 | + 018, 460,19 iG 033, 767,07 | +-009,791,12 

3 (b) | --012,764,50 | —-082,030,73 ~ 002,27: 5,94 + 030,479,17 +°018,233,88 | +031, 794, 16 | +°008,188,80 

(ec) TEbtsicusen — 080,956, 49 | —*002,360,54 + °030,869,67 | + ee +°029,455,85 | +-006,551,72 





ao ———————-} -—___—_ 


] 

(a) |- -003,450,60 | —-028,237,21 | |= -000,587,66 +°017,032, 32 +°011,713,27 + 024,762,35 | + -009,092,34 
| 
| 


1 | (b) | —-002,645;25 | — -026,280,92 | — -000,528,69 | +:017,630,94 | +-012,084,98 + -024,998,89 | + -008, 434,25 
(e) |—-001,920,26 —-024, 1 106,94 | ~ 000,459,56 | +°018,316,53 | +°012,492,17 +:025,189,70 | +007, 601,99 


Baw: 


(a) | -001,562,59 — -016,357,80 | — 000,196,08 | + °013,951,10 | +°010,408,16 4+ 1024,456,57 * ‘010, 780,30 
5 | (b) |—-001,096,19 —-014, 410, 34 | —*000,106,48 | +°014,492,89 + 010,926,94 | + °025,583,17 |+ 010,698, 56 
| (ce) |—-000,731,63 — 012'372,26 — -000,003,49 | + 015,100,01 +°011,507,01 +026, 761,82 | + 010,435,95 
| Reese Baers ek es 





(a) |—-000,728,92 —-010,344,20 | + -000,068,82 | 4+-013,421,77 + °011,082,88 | + °029,840,53 + -016,766,62 








6 | (b) |—-000,448,58 |— -008,432,81 +-000,177,88 +°013,793,06 +°011,705,64 + -032,119,62 | 4+-017,871,20 
(e) he .255, 73 — -006,613,75 | + 000,295,92 + 014,164,31 (+ 012,382,26 +084, 601,52 | +-018,889,99 

(a) |—-000,090, 63 - “002,150, 92 | +-000,121,52 +°005,185,57 | +°005,018,14 | + 016, 96° 5,98 | +-014,515,02 

7 | (b) |—-000,037,11 |—-001,530,11 +-000;149,03 +-005,035,92 | + -005,142,06 + 018,430,12 | +016, 770,70 
| (c) 2 000,000,75 | — -001,037,56 | -+-000,167,89 + :004.808,83 -+-005,217,99 | +-019,928,67 | +-019,271,47 











TABLE XII. Values of hy. for Columns. 











r s=1 | s=2 | s=3 s=4 | s=5 | s=6 | s=7 | 

| ——_|—_}—_|-_ | — 
“45 _9-19299 | —-92114 | —-02549 i+ ‘56650 | + °98336 lt “45054 | + 230569 

“50 —2°18505 | — 91848 |—-02538 |-4°56616 + “98315 |4+1°44900 | +2-29880 | 

55 ~ 2°17567 | — 91485 |—-02521 fr 56580 | + 98286 | +1-44685 | + 228999 

| | | 

iy |} ee | 

Actual h,. | —2°19667 |—-91404 |- ‘02553 |+°56594 | +°98333 | +1 44723 | +2 29464 | 

| | 
|__| ueneen GENE MRE? eS ee 
Extra- or | .4999 | +5599 | -4167 | °5309 | -4585 | 5426 | *5249 
Interpolated r | 

OD Ror cis hierinea atiy c 10—2 








148 On Polychoric Coefficients of Correlation 
We have thus the values of r found from each column*. 
We now turn to Table X and calculate in exactly the same way the values of 
Nee = 3 Te Be Te +7967, Be Ty +... +19 Be Tye Ty + ...; 
for the r peculiar to each column for that column. We thus obtain Table XITI. 
TABLE XIII. 
Values of 'sy for r of each Vertical Column. 








3’ s=1 s=2 s=3 s=4 s=5 s=6 s=7 | 
eis = pot se ees weer Tk, ee | 

| , Ar or , | 
1 \- 013,707,54 | — 044,362,34 | — 013,376,98 | — 002,394,74 | — -000,770,04 | — ‘000,212,73 000,006, 10 | 
2 


021,31 5,40 | — 153,841,07 | —-074,192,36 | —-029,418,02 , — -009,240,63 | — -006,036,02 | — 000,641 41 
3 | —-000,771,58 | — -008,202,77 | — 004,826.27 —-002,225,87 | — -000,633,67 | — -000,217,60 | + -000,030,11 
4 + -000,880,64 | +:014,176,58 | +°018,829,61 | +-015,680,02 +005,954,01 \+ 008,797,09 + -001,837,83 | 
5 |+-000,7! 59,52 | +-013,488,41 | + -024,318,82 | + -022,680,28 | + -009,395,75 | +016, 257,72 + 004,194.22 | 
6 \+ 000,584,54 | +-011,211,78 | +031,232,08 + -032,630,32 | +-015,526,73 | +°032,207,15 + -011,205,66 | 
7 | +-000,127,47 | +-002,739,83 +015,206,16 | +017,131,65 4-010,87 786 | + 028,515,80 + 017,104,711 | 








k, --963,72 °|--507,66 |--010,29 | +-303,04 + 425,95 a 789,11 — |-+1-238,75 
| | i 
| - | Z ‘ ‘ 


The values in Table XIII divided by fi,y/ngy from Table XIV and summed for 
each column give, on multiplication by N/n,., the k,. of the last row of the table. 

To obtain Table XIV we must return to Equation (x), use the appropriate r for 
the column and the values in Table IIT of $,7,3,7,’.. Taking o, and o, as units 
of the horizontal and vertical variates we can plot k,. in Table XIII to h,. from 


Table XII and so obtain the regression line as formed by the means of each column, 
and set against it the regression lines as found from polychoric r, = ‘5034, or *5204. 


TABLE XIV. 


Values of — for columnar Values of r. 
yo 4 











3’ s=1 s=2 s=8 s=4 | s==5 s=6 e=7 

1 | 1:481,160 918,247 | -881,901 o | °366,258| © 

2 | 850,270 | -995,746 | -946,290 | 1-319,975 | 1°351,776 | 1:261,565 | 

3 ‘910,673 | 1°075,087 1°090,803 *830,945 | °853,291 °876,250 | 1°668,232 
| 4 1°846,116 1°016,776 1 066,432 *856,640 | °855,012 | 1°248, 823 | ‘600,417 
| 5 oo ‘866,469 | 1:028,801 | -992,395 | “968,288 1°018,112 -937,952 
| 6 a) “980,885 | *887 ,677 1°263,099 1°619,040 *803,917 ‘999,089 
| 7 ox 454,171 | 808,851 | 1°368,315 *848,918  1°316,691 | 1°074,517 





The mean value of r weighted with the column totals is -5022 which is in reasonable accord with 
(i.e. within the probable error of) the results on p. 142. 








eee 





Kari PEARSON AND Econ S. PEARSON 149 


This is done in Diagram I. But what we actually desire is to compare the obser- 
vations and the regression lines as given by the present polychoric method with 
those obtained by product-moment methods. 





-2 zr! 
-1H e> Se 





pre 
fe) 
| 


+1H “ee 


Stature of Son in Inches, 


+21 

















1 heals 


“= i | +1 +2 


Stature of Father in Inches. 


Diagram I. 


Our actual data from which the table on p. 135 was obtained are given in 
Table XV. The following are the values of the constants in inches : 
Mean Stature of Father: % = 67878. 
Mean Stature of Son: 7=68'"845. 
Standard Deviation of Father: o, = 26576. 
Standard Deviation of Son: — a, = 26885. 


Correlation of Father and Son: 7 = °5189 + 0160. 


In Diagram II the regression line (slope, 5245) with means of the arrays as 
dark circles is given. Against this we have put as hollow circles the values of 
h,. and fy. multiplied by their respective s.D.’s to indicate the result as worked 
out in the present paper. The closeness of the polychoric coefficient *5204 and 
the product-moment coefficient does not permit of two regression lines being 
drawn. It will be seen that the fit to the observations by use of broad categories 
and the polychoric method is really quite as satisfactory as the fit by the product- 
moment method. But the amount of arithmetical work is incomparably greater 
by the former, even if it be less than Ritchie-Scott’s process with 49 cells would be. 


Accordingly we now proceeded to investigate the extent to which approxi- 
mations shortening the arithmetic would introduce serious error. The first question 
to be answered is: To what extent in finding the means k:,. of the arrays is it 
needful to use the actual value of the correlation coefficient as found for each 
column? In order to test this we proceeded to find the k,. for each columnar 





“Ud SS gY GALLI OTS 








On Polychoric Coefficients of Correlation 


S 
1D 
= 


‘TI wmeuseig 


*sayouy ur sayzvg fo ainjnzg 


























oh 


1h 


a lo) 
8 © = 
"sayouy ur uog fo aunqnig 


= 
© 


o 
*) 


s9 





Stature of Son. 


Kart PEARSON AND Eaon S. PEARSON 151 


TABLE XV. 
Correlation of Stature in 1000 pairs, Father 
Stature of Father. 


and Son. 


























| } | H | ! | | 

| PGR ERG REPU RGPeR ES aR urea eres: ley 

= = 3 & | % SS | Ds % e | % Sal/silH xls ls] | 

i” © | © Sui 1S | © oS Oo | © o | > =~ ~ | > a | 

pee SE, Rill PER ES, Bey Pee, ie ER: PN I PRE POS Dia Ace, BR) mem, tH | sail 2 | 

ce Fe ses es ee | ee Bee | oe es ee | oe ee 

| 61"°875—| 1 eh oe ee eek ee eg ey ee eee SO, ee ee 6 | 
| ee-s75—T —|—| 3] 4] 4] 8| 1] 3] 2}]—|] 1,—/—|/—/|—/;-—|—] @ 
| 6s"-875—] 1 |—| 1] 5 | | 6| @9}] 8) 2) tlh —) mle | — | H | HT 32 
| 64"875-1 2/3) 8 | Sin] 1 | wl] 4) t1 s.—|] 1)—\l=|—(=—2 
65"875—1 1 | 1 | 2| 6| 9|10|;20]}17/15| 7| 6) —| — =e 
| 66"875—] 1 |— 6] 4/11) 24 21/28/10}/12] 7) 4) 1)/—}— —j} 199 
| 67"-875—] — | 2 | 2] 7] 9 | 20! 16] 33 | 27| 26) 20) 13} 6; —|—)| —| —f 181 
| Gs B75—a 1 | — | 211-4 | 1/12/13] 10] 22] 96|24| 6] 2] 2} 1})—}]—J 13 
69"-875—| — oe | } 5/11 | 15] 18/18] 23)}18 13) 4) 4) 1) 1 | —f 131 
| 70"-875— —j—/ 3/ 5] 4/18/18) 18/18} 8] 7] 3) 1 }—|]— 80 
| 71"876—9 — | —| —|— | —] 4] 1] 7] 7] Oo] @] 9) 7) S| 2 |—l|—a 8 
Es = SES eet sees Sere te Oe ee ee ee Ye 
teas | | [et S|] ee ee) PP ee ea ee ee eee 
TE S55 — | — | — | ~ | — |) —~| — | —|] 2] Bil Pi—} 812) —]— s 
ee eS eee oes ee | ot | a) SP eee eae 1 
76°°875—§ — | — | — | — | — | — Say ee, eee, by BE Hy Ss a ea 2 
77"875— | —| — —|— | — ras ea eed (Mf Se i eae en es 3 
19°'S75——§ — | — | —|— | — | —| —| —| —| —] 2) —| —] —)|—} —1 = 1 

| | | | | | | 

Totals | 7 | 6 | 17 | 36 | 63 |109 | 111 | 149 | 139 125 | 109 | 67 | 34 | 18 | 7 | 3 | 0 | 1000 

















array for the same correlation coefficient, and we took for the value of that 
coefficient ‘5000, somewhat under the value found by either pelychoric coefficient. 


Table XVI gives our results. It involved finding a new series of values for 
X's, but those for Tixy/Nsy have already been computed under (b) in Table IV. The 
results are given in terms of inches. 

TABLE XVI. 


Columnar Means by Different Processes. 











Re, XiFy Ka. X Cy kg. X Oy | 
8 hy. X Oy Ss Less aa Sees Common base 
Each column | Each column | Each column 
| its own r for r=*50 assumed Normal 
1 | —5°8379 —2°5881 — 2°6498 —2°4809 y 
2 | —2°4292 — 1°3633 — 13531 — 1°4357 3 
3 | — :0678 — +0276 - ‘0176 — ‘0701 3’ 
4 | +1°5040 + ‘8138 + ‘8087 + ‘7632 3’ 
Lo | +2°6133 +1°1439 +1°1511 +1:°0866 3'+4’ 
| 6 +3°8462 +2°1192 +2°1122 +2°1744 4’ +5’ 
| 7 +6°0982 +3°3267 +3°3194 +3°1109 | 5’ | 
| | | 




















152 On Polychoric Coefficients of Correlation 


An examination of the fourth column of Table XVI shows us that we have 
not for practical purposes seriously modified the columnar means by using + = ‘50 
instead of the individual value for each column. This is illustrated in Diagram ITI, 
where except in the case of the first array there is hardly daylight between the 
two series of points. 








Stature of Son in Inches. 


+24 














L 


1 
“2 “1 oO +) +2 
Stature of Father in Inches, 





Diagram III. 


In Diagram III the hollow circles give the means with 7 obtained for each column, the nearly 
superposed dark circles the means with r= *5000. 


The solution of the problem therefore falls back on Equations (x), (xvit) and 
(xv). We should still have to calculate $,7,, 3,7)’, 3, 7, and $, 7,’, but we should 
only need the three series of products 9,7, Sy Ty’, 35 7) Sy T,’ and 9,4 Ty Sw sah and 
to obtain k,. it would be adequate to use a value of r for which jfigy/n,y had been 
found for the final interpolation. Still this involves very lengthy arithmetic, and 
we naturally crave for a still easier process. The present full working out of a 
numerical example enables us for the first time really to test the adequacy of an 
easier method of dealing with such polychoric tables which has been long in use 
as an approximate method in the Biometric Laboratory. 


(6) It is clear that if we could find the means of the columnar arrays, we 
could readily obtain the correlation and the regression line by aid of the correlation 
ratio corrected for class index. The whole problem accordingly turns on a ready 
means of reaching—at any rate—an approximate value of the mean of a columnar 
array. This array is the slice between two parallel planes of a normal correlation 
surface. 


In the ease of a surface of zero correlation 


OF se Z e - 4 (X?4 Y*)/a? 














7 


Kart PEARSON AND Eeon S. PEARSON 153 
the slice between X, and X, has for its volume on dY 


Xx, fia ee 
| ‘ele aXe Vl" ay; 


xX, 
the slice is therefore given by the normal curve: 
Ordinate = const. x e~ 32! 

It seems therefore not unreasonable after the surface of revolution is stretched 
and slid into a correlation surface to assume the slice to be still approximately a 
normal curve. Unfortunately the determination of the best mean and standard 
deviation for normal material given in broad categories does not admit of very easy 
solution. What we need is the difference between the means of a columnar array 
and of a marginal frequency as a multiple of the standard deviation of the latter. 
We shall obtain results differing more or less from each other according to the 
individual broad category we take as the basis of comparison between oa, the 
standard deviation of the sth slice and o, the standard deviation of the marginal 
frequency. In fact the range of any broad category or of any combination of broad 
categories, except the tail categories, can be made a means of linking up o, and oy. 
A little experience, however, shows (a) that it is undesirable to find the o, of 
any array from a category of small frequency, and (6) that for arrays of small total 
frequency symmetrical tripartite divisions as far as feasible are the best*. The last 
column in Table XVI shows the system selected for each of our columnar arrays. 

Take, for example, s = 5, the columnar array may be taken on the base of 3’ 
and 4’ categories as 





1’ + 2’ 9 335 

3 4+4 sal and compared with /421 

5° +6'+7' 24, 244 
Totals 69 = 1000 


as the corresponding marginal distribution. The corresponding proportional 
1304 3350 
6521 4 -7560° 
mean+t from the two dichotomic planes in the first case are 


frequencies up to the dichotomic planes are : The distances of the 


— 112456, and +°39106;, 
and in the second case 

— °4261o¢, and + °6935¢,, 
where o; is the standard deviation of the normal curve assumed to represent the 
columnar array 5. Accordingly the range of 3’ + 4’ categories 

= 151550, =1:1196c,, 
which gives o; in terms of o,. 
* The probable error of a standard deviation found in this way is discussed in Biometrika, Vol. x11. 


p. 129, 
+ Found from the Probability Integral Table. 








154 On Polychoric Coefficients of Correlation 


Hence the distance between the means is 

69350, — 39100; 

(6935 — 3910 x 1:1196/1:5155} o, 
4046 0, 


= 1:0866, if we introduce the value of o,. 


This and the corresponding values are recorded in the fifth column of 
Table XVI. It will be seen that these values approximate to those in the third 
column, the greatest differences being in the small first and last arrays. 

Of course in actually working with material solely given in broad categories we 
use the value ‘4046, treating o, as our unit of measurement. The means of the 
columnar arrays can be found with great ease and with considerable approximation 


by this method. 


If we now proceed to take the mean of our means duly weighted with their 
frequencies, we find it to be —-0510,—not a very serious divergence from zero. 
However, we subtract it* from the means in the fifth column of Table XVI, 
multiply the squares of the remainders by the corresponding frequencies, sum and 
divide by the square of o,. Thus we obtain 


1:818,8034 

Pin = 252 

v= 79119108 "@ 
or: n = 502148, 


If we divide by the class index correlation of the «-variate, i.e. (962,329+, we 

obtain 

n = 5218, 

which correlation ratio we may take to be the correlation coefficient and compare 
with our polychoric coefficient 5204 (p. 142). Clearly although our means as found 
by the hypothesis of norma] distribution of the columnar arrays agree only approxi- 
mately with the polychoric means of the third column of Table XVI, they lie 
practically on the same regression line, as Diagram IV indicates. We conclude, 
therefore, that in this case as probably in many like cases, it is quite adequate to 
obtain the means of the columnar arrays by treating them as normal distributions, 
then determining their correlation ratio and correcting it for the class index. The 
corresponding regression line with the means of the columnar arrays indicated 
will be for many purposes an adequate graph showing the general nature of the 
correlation. 

The general purpose of this paper has now been fulfilled; it has been shown 
how a general polychoric coefficient covering all the data provided in a given 
contingency table may be found, and how a graph may be drawn representing such 
a table effectively. At the same time such a process is very laborious and probably 
will not be lightly undertaken or only in cases of grave uncertainty. The method 


* Correlation ratio without subtraction =-5222. 
+ See p. 142. 





KARL PEARSON AND Econ S. PEARSON 


"AT Wmviseig 


‘sayouy ur aygng fo aanqnig 
69 89 x) 99 


og 





I I I 














*sayouy wr uog so aanjnig 























156 On Polychoric Coefficients of Correlation 


is one of fitting the “best” normal surface to the data subject to the limitation 
that the marginal totals are exactly reproduced, and this limits the generality. 

An example has been given of the process, but it is seen from this example that 
the heavy arithmetic does not lead us to any more accurate value for the correlation 
than far simpler methods. Thus: 


Correlation from product-moment ; = ‘5189 + 0160. 
Polychoric Correlation Coefficient “ Best Fit ” = 5034. 
Polychorice Correlation Coefficient “ Product Moment ” = °5204. 
Mean Square Contingency, Corrected for Class Indices = ‘5179. 
Correlation Ratio from means of arrays = ‘5218. 


The latter method, which has been long in use in the Biometric Laboratory, is 
thus, when used with due precaution, seen to be justified by the theoretically 
preferable polychoric method. Ifa method could be discovered of finding uniquely 
the mean of a columnar array, using all its cells at the same time, this method 
would still more effectively replace the polychoric correlation coefficient. 




















ON EXPANSIONS IN TETRACHORIC FUNCTIONS. 
By JAMES HENDERSON, M.A., BSc. 
(1) WE define the tetrachoric function of order s to be 7, (#), where 


¢,(e)= + (- =) — die hice eae (i). 


Other writers have adopted various other values for the external numerical 

factor but this is immaterial. The factor a was chosen because it gives an ex- 
s: 

tremely simple expression for the volume of a quadrant of the normal bivariate 
frequency surface, and because for tabulating the numerical values of the functions 
it is necessary to have some reduction factor of this kind to keep them of manage- 
able size. We can usually drop the argument x and speak of 7,._ The values of 7, 
for s=1 up to s=6 are tabled to five decimal places in the book, Tables for 
Statisticians and Biometricians*, for values of $(1—a) (which is really 7, when 
the argument is negative) from ‘000 to ‘500 at intervals of 001. With a different 
multiplier they have been tabled by Charlier+ to four decimal places only for 
s=1, 4and 5 (#=°00 to 3). 


The general form of the tetrachoric function of order s is 





1 (., (s—1)(s—2) ,, , (8—1)(s—2)(s —3)(s — 4) on 
~Oh= 7— 1— — = 1! - ax _ = 2.21 a — ote 
1 ; ne 
; EO daervawmionwie ens sioneroien ae eetarie saison se 11), 
N 2Qer () 


that is, the ordinate of the normal curve of errors multiplied by a polynomial of 
degree (s— 1). 7, is simply the ordinate of the normal curve, while 7, is the area 
of the tail of the normal curve up to a given abscissa x, with the addition of an 


a $a? 


wv 
arbitrary constant. This constant may be so selected that r= | dx, and 


x Vr 
will be found from the tables of the probability integral. It will be equal to 
$(1+4), if # is positive and $(1— a), if # be negative in the usual notation. 
Accordingly the expansion of a function of 2, f(#) in a series of tetrachoric 
functions, is really the expansion of the difference of the function and a multiple 
of the probability integral in terms of 


& x -42%/02 
Co + Cy — + Ce — +... é@e* ; 
ion Cc 


where o and ¢ , ¢,, Co... are at our choice. 


* Cambridge University Press, p. 1, and pp. 42—51. 
¢ Vorlesungen iiber die Grundziige der mathematischen Statistik, 1920. 








158 On Expansions in Tetrachoric Functions 


The real reason for adopting 
Co Tyo + Cy T + Co T. + Cz TE ---, 
instead of the above expression, is that the calculation of the constants cj’, c’, cy’ ... 
is more direct than that of c,, ¢,, c.... because the tetrachorice functions are semi- 


orthogonal functions*. It will be seen that the problem of expansion in tetra- 
choric functions is closely related to a theorem of Laplace. If U be a unimodal 


function of « within the range under discussion and the integral J = [ Ud« be 
required, Laplace transfers to the mode m as origin so that «= m+ & and writes U 
in the following form : 

U = Uy, e~ (1 + a, & + a,& +...). 


He extends the limits to 0% in both directions by supposing U = 0 outside the given 


D 
range and in the integration applies the well-known values of { Ee whE, ie, 
s —-@ 


zero if s be odd, and again if s be even (= 27), 


aD 
| e7 Mlo* gr dé — (Qr —1)(2r—3)...3.1.V2r oa", 
—o 

It will be seen that Laplace is really proceeding by expansion in tetrachoric 
functions as the process is precisely the same whatever be the limits of the integral 
of U. Following Laplace we develop our function in “incomplete normal moment 


x et? 


v 
functions,” i.e. ——— da+; it is better to use tetrachoric functions. The series in 
V2 a 
J -—@ oT 


tetrachoric functions seems to converge slightly better than that in incomplete 
normal moment functions. 
If we have 
F (x) = dy, + QT: + AgTs + 00. + AsiTs +..., 
then, assuming we may integrate the right-hand side of this equation term by 
term (i.e. assuming uniform convergence) between w and o , 


[Fe eo thy + a} i + ..; 





V2 V3 
ao 
? Poe oe 
since | TM OE EE. os ixcedcasctneeinspaxecspehanin’ (iii). 
x Vs 


* A series of functions /, (x), fo(x) ... f, (x) ...fy (x) is orthogonal if ly. (x) fy (v) de=0 when s and 


s’ are not equal, the integration being throughout the range. They are semi-orthogonal if 


| J, (&) Sy (x) & (x) dx =0, 


¢ (x) being a function of «x peculiar to the series. In other words a system is orthogonal if the sums of 
the products of different order functions vanish without weighting for «. A system is semi-orthogonal 
if we require to weight the values of x to obtain the vanishing of the product sum. This weighting is 
the great disadvantage of semi-orthogonal functions. In our case of the tetrachoric functions the weighting 
factor is e}” or the tails of series are excessively weighted. 


+ Discussed Biometrika, Vol. v1. p. 59. Tables of these functions up to s=10 are given in Tables 
for Statisticians, pp. 22—2. 








JAMES HENDERSON 159 


1 1 a : row : 
Let t= |— ps1 e7™, where p,_, is the polynomial in « of degree (s — 1) 
Vs! N20 
in (ii). 


Let 7, be another tetrachoric function and suppose s’ is greater than s. Then 


° si 2 a te, Bust 
| T.Tye™ daz = Ps— (- ) <— de, 


- Vsi vst V2er J -« da} /2ar 


Now since t,(% ) and t,(—) will always be zero owing to the exponential 





factor (s > 0) we can integrate by parts transferring the : from the exponential 
x 
to the polynomial, therefore 


. 1 et d \*~2 e- 1") * 
[we da= —— ee = = = 
| 5, En ae oak Poa( de) me 


-D 


po d s'—2 e7ie’ d 
+ Fics (- i) Vin dz?s ae m 


The integrated part at every step vanishes at the limits and ultimately 


“2D e yi 1.2 
= l l ae ei 
T,Ty ee da = —— | —— ,_, -—= du. 
es Vstvs'!V2Qar J -« da’ - V Qar 





Since p;., is a polynomial of degree (s — 1) and s’ is > s the differential of the 
polynomial vanishes, i.e. 


| CaP ae GE... CDM risk Ruisinceaks ..(iv). 
If s’=s then the differential of p,_, reduces to (s — 1)! so that 
“2 a 1 Reet 1 ! 0D 9 - hu? 
| tee da = —— Ce | — da 
-@ V 20 s! =o V Qar 


Lod 


NO 8’ 


es a penutcsewe einen maceinkee (v). 


These equations (iv) and (v), which give the fundamental properties of the 
tetrachoric functions, enable us to expand any function F(«) in terms of tetra- 
choric functions if we can find the value of the integral 


oa as 
. 1 > he : 
| F(a) 7,2" dx= | | — Pos F(a) Oar ....... 0000000 (vi). 
- V2aJ -« Vs! 

Since p,., is an integral function of x, this amounts to saying that we can 
expand any function of which we are able to determine the successive moment- 
coefticients. 


The practical value of the functional expansion when obtained is, however, 
a very different matter. That depends on the convergency of the series and our 
experience has shown us that in the most common cases the convergency is so 
slight or non-existent as to render the expansion idle. 








160 On Expansions in Tetrachoric Functions 


The matter is a very important one for Thiele*, Edgeworth+ and Charlier+ 
have proposed to treat skew frequency distributions by a process, which amounts 
to the same thing as the expansion by tetrachoric functions, 

An attempt made many years ago§ to expand Incomplete ['- and B-functions 
by Laplace’s method in Incomplete Moment Functions convinced Professor Pearson 
that little was to be gained by a series expansion in the form of a polynomial 


multiplied by the ordinate of a normal curve. A variant of this method, that of 


expressing Incomplete T'- and B-functions in a series of tetrachoric functions, was 
tried a year ago and it was found that except for a small distance round the mode 
this method of expressing a frequency distribution was quite ineffectual. The 
matter is of considerable importance because quite recently a Scandinavian actuary 
in America|] has been analysing mortality curves by tetrachoric functions and 
asserts not only that they give a good fit but apparently believes that each function 
of the series has some natural physiological meaning! It is quite possible to re- 
present the survivors of 100,000 persons born in the same year of life by a Fourier’s 
series from 0 to 100 years but one would hardly claim any special physiological 
significance for the individual periodic terms". Such a series however is far 
easier to deal with in later treatment, such as differencing, than a series in tetra- 
choric functions. 

For the numerical calculation of the tetrachoric functions the difference 
equation of these functions is invaluable, 1.e. 

Ts = @BsTs-1 — YsTs—2, 
where # is the argument of the functions and 
: 1 s—2 
hs = j-? bs aa pay og : 
V8 Vs(s—1) 

Tables of 8, and y, are given in Tables for Statisticians (p. 1 of introduction) 
to five decimal places for s =7 to s= 24 (the first six tetrachoric functions being 
given on pp. 42—-51) and in Biometrika, Vol. xiv. p. 130 to 7 decimal places. 

For our work 8, and y, were required to 7 places (sometimes to 8) to obtain 
the requisite accuracy. The procedure consists in calculating 7,, which is equal to 
e-ie 
V2a° 


referred to above the higher tetrachoric functions are obtained in rapid succession 


directly to the required degree of accuracy and then by means of the tables 


on the machine for a given value of the argument. In the testing of our tetra- 
choric series seven-place accuracy was aimed at so that it was necessary to calculate 
rT, to eight places, which was done with the help of Vega’s ten-figure logarithms. 

* Forlaesninger over Almindelig Iagttagelseslaere, Kjsbenhaven, 1889. 
Royal Soc, Proc. Vol. uv1. p. 271, and in many papers, Journal of R. Statistical Society. 
Vorlesungen tiber die Grundziige der mathematischen Statistik (Hamburg, 1920), p. 67. 
Biometrika, Vol. v1. p. 68, 1908. 
Arne Fisher, Casualty, Actuarial and Statistical Society of America, Proceedings, Vol. tv. Part 1. 


— mm tt + 


No. 9. 
“| A normal curve, for example, is quite adequately represented by two or three periodic terms: see 
Phil. Trans, Vol. cuxxxvi. A, p. 355, 1895. 




















JAMES HENDERSON 161 


(2) It is well known that a wide range of frequency distributions can be 
adequately represented by one or other of the curves 


y=yoe-vla (1 + “ Pie ee (o| 
x m,—-1 e\m-1 [ (vii) 
and Y=Y (1 - > (1 a) trtstenestesees (0)| 


By a change of origin and the appropriate stretch or squeeze these may be 
reduced to 

PS ciiiceteceencsa eke’ (a)| 

and Y = Yq ~1 (1 — wT! 0d... ccc cece eee (b)| 

Now, generally, it is not the ordinates of these curves which are required but 

the areas of certain portions, or in other words the probability integrals of these 

skew curves. The total range for (vii) bis(a) is 0 to o and for (0) is 0 to 1; since 

er @ 


| o- 6-* dz = I'(p) 


“0 
l 

and | a™-l(1 —av)™-1ldz = B(m, m.), 
0 


we may take these probability integrals to be 


ry 


l 
) = yP—1 p—v 
I (p, v) Tiny |e vP— e—” dy 
and B(v, m,, my) = Sates ym-1(] — v)™-1 dv 
- ” - B (m,, Mz) 0 d 


which are the ratios of the incomplete to the complete [- and B-functions. 


The equations on p. 158 show us that if either of the frequency functions (vii) is 
expressidle in a series of tetrachoric functions their probability integrals (assuming 
convergence) will also be. Now there is no doubt that a large mass of material does 
not differ practically from the forms in (vii) and accordingly if the above probability 
integrals cannot be adequately expressed in a series of tetrachoric functions, we 
may be certain that tetrachoric functions do not furnish a suitable method of 
representing skew frequency. Accordingly our problem reduces itself to the 
following one: Can J (p,v) and B(v, m,, m,), or the Incomplete P- and B-functions, 
be represented with adequate convergency by a series of tetrachoric functions ? 
After examination of the numerical and graphical results obtained, we are obliged 
to conclude that the answer to this question is in the negative. 


(3) Let us first consider the expansion in tetrachoric functions of the function 
PPS TE OOD sv cseniierviesrivessanteieereel (vill). 


In expanding this expression there are at least two methods, which we ought 
to consider, and one may have advantages over the other as far as convergency is 


Biometrika xiv 11 











162 On Expansions in Tetrachoric Functions 


concerned. It may be expanded with regard: (i) to the mean and the standard 
deviation, or (ii) to the mode in the manner of Laplace*. 
(i) The mean of the function (viii) is easily found to be at «=p, the mode is 
at « = p—1 and the standard deviation is Vp. 
Referring to the mean as origin the function becomes 
js aka poled (ix). 
l'(p) 
; e- Pp d é 
Let y=¢(- D)- , where D=— and 2=—~= ........ cee eeeee (x). 
N Qo dz Vp 
Except for a numerical factor the right-hand side is a series of tetrachoric 
functions. 
Let ¢(— D) =¢,-—¢,D + ¢,D*...(—1)* ¢, Do +... 
The function ¢ (— D) has to be determined, i.e. we require to find the succes- 
sive c’s: > 
a fe e~* 
(Deal oD 
=(%T+C; V2! T, +0,V3! T3+ 20. +Cp-4 Vs! Tat... (xi). 
To determine the c’s. With the origin at the mean the function y must be 
taken as zero from — « to —p, while from —p to + it is given by (ix). The 
c’s will be obtained most easily by multiplying both sides of (x) by e* and equating 
the coefficients of powers of @ on both sides of the equation, i.e. we make all the 
moments of the two expressions for the curve the same, for the coefficient of 6° on 
either side is the sth moment+t. Thus 


i” - o— &/2p 
| ye* dé = ef &(— D)° a dé ; 
—o 7 -@ V Qa 
but y = 0 from «=— @ to — p. 


(E+ pprety  f® e- Fp 
- TQ) nie de=[__ et 6(- a 


a 
Accordingly 
io 
Now a=p+é and z=€/Vp. 
© 9 (*X—p) ~P—1 p—x ee —322 
é x é / é 
Thus | ——__——— dx = vp | eOnP =) 6(— D) —— dz. 
0 r (p) —@ \ 2ar 
The left-hand side is equal to 
ir a) q ‘ 
aP— e-% (l- 6) 
e~P? [ — da 
0 l (p) 
2 uP —— 
= ¢ #? [ meol LLL Let #(1 — 0) =u. 
Jo (A -— Op C(p) (1 - @) 
=e (1 — 0), 
* Laplace’s method is really an expansion in incomplete normal moment functions but as we have 
seen (p. 158) these may be replaced by tetrachoric functions. 
+ We owe this elegant method of determining the c’s to Mr H. E. Soper. Originally the c’s were 


determined by use of the fundamental property of the tetrachoric functions but that method, while 
leading to the same result, is more laborious. 


























JAMES HENDERSON 163 


To find the value of the integral on the right-hand side, consider the term 
cs (— D)* in the function ¢(— D). Its contribution to the integral is 


| iad (- s ) tg dz, 
-o dz N Qer 
where 0 = 0Vp. 


On integrating by parts the term between limits vanishes owing to the factor 
e~4". Hence the integral 


o = 1.2 
. , d \*— e—i#* 

= 0,6" | ef -5) dz, 
as dz} qr 


and ultimately 


Se de 
( = ¢,0 7. Jac dz 
—» . NQer 
= ¢, 672 et”. 
Therefore the whole integral on the right is 
p(0’) eb", 
Le. eP°(1 — 6)? = Vp b (Vp8) eve, 
* Vp b(V/p0) = eve {1 — 8)», 
and b (Vp0) = cy +, (V pO) + c(V p80)? + ... +0, (V pO)’ +... 
=Cot CO + Cg OE .....cccceee +¢, 68 +..., 
where cs =c,(Vp)® or c,=c,' (Vp). 
Now e~Po-iv® (1 — )-? 


a e~ p0—ip0'— p log (1-@) 
= ge P9—h p+ p+ pe?+ pF +jpe+... 
= phpO+ {pet tp... 


= b, + 6,0 + b,02 + 6,0 + b,04+..., 
where 


b, =1, b, =b,=0, bs =4p, b,= 4p, bs=1p, bk = 4p +4 (hp? = Kp (Pp + 3), ete. 
But Ve, = b,, therefore 


1 - / 
= Vp » = =0, c= ivp, y= jvp, C; = $V Pp, ete. 
? 
1 1 1 
so that GO= =, G=c=0, o,=1-, = }—~, o=}-, ete. 
/ | ae 4 / 5 2 
Vp P pvp B 


For numerical purposes these coefficients are much more usefully obtained in 
the following way: 

Let e~ P?- tp (1 — 0)? =), + 6,04). + ... ete. 

Take the differential of the logarithms of both sides; then 
(— p— pO + p/1 — 0) (b, + b,0 + b.0 +... + 0% +...) =b, + 20,0 + ... + sb + «.., 
Le. pO (b+ b,0+...+b,6%°+...)=(1 — 0) (b, + 20,0 + ... + sbsO5 + ...). 
11—2 











164 On Expansions in Tetrachoric Functions 
Equating coefticients of 6* we have 
psa = (8 + 1) Deg — sds, 
‘ 1 as 
i.e. bea = oy ikl Kit cvcieinctaaeenrectene (xii). 


By this difference formula successive b’s can be found very quickly if by, b,, b. 
are known and we have already found these. 


Now C; = (Vp)-8 Cs. 
= (Vp)-* by (Vp) = by/(V py, 
or b, = (Vp) ey. 


Substituting in (x11) 


- “eee . 
(Vp)? Cou = (s a 7) is (Vp) Cy em p (Vpy Co-a} 


ee {SC, — Coa}, 


1 
C4, == —— 
Vp (s +1) 
This formula gives us very readily the coefficients of ¢(— D) and thus the 
expansion is obtained. 
We had, Equation (xi), 


gen e-* e- £/2p 


'(p) =¢(—D) Von = 7, +¢,V2!t%+0,V3! 7,4... te,V(s+1)! Tetit vos y 


or iF Mah sttiovecsivcc: wdaieeecdl (xiii). 


‘ 1 
and all the c’s are known since c¢, = —, ¢, = @ = 0. 
0 » V1 2 


Vp 
To find the area under the curve (xi) up to abscissa x, remembering that the 
left-hand side is zero from £ =— o to —p, 


£2/2y 


é »)P—1 e- E+) 
| (E+ p)?e-Etp le, 


é e7 


‘ f oP! g-& Be ii e— 3" 
, —T" = — dz 
Le, I. Pp) di vp] od ( D) V5‘ 


———= € 
2a 


= vp | (CoT1 + Cy V2! Te ce Co Vs! Tet...) AZ. 


T z . 
Now | 7.dz=——, 
- 2 VS 
therefore 
"oe P [s+ ei V2! 9, V3! 7, Vs! 75 
l'(p) dx =" p ¢, | Fa OE — 0) Fa — Og ee — Oy a — 0 
0 p J -oNV%r V2 V3 Vs 


Se bees 1 
=4(1+a,)—Vp {7 + V2! t+... +6.4V(8—1)! 4, + «..} since aa 














JAMES HENDERSON 


Therefore finally 


aP—) e-% 
i Tip) da = $(1 + 2) — AsT3— .0. — Ug Tym oe, cece eee eeeee (xiv), 
0 


(since c,=¢,=0) where a,=Vpe,Vs!. 


Now ¢s4, = 5 — {sc, + css} from equation (xiii), 
Pp 
ts Mey 1 1 {7 s (he =|, 
ry aol Vp (8+1) U vs! *Te-91 
1 ests Soe a 
therefore si. = Ip (s+1) {sV(s + 1)a,+V(s + 1) (s) (8s — 1) ayo} 
=/ eon + ¥G= Ta aiken (xv), 
where a=1, a=a=0. 


The argument of $(1 + a) and of the tetrachoric functions is &/Vp, which equals 
x 
Vp 
Since the terms 7, and tr, do not appear one might hope that only a few terms 

of the expansion (xiv) would be required to obtain a sufficiently accurate result. 


P Z, say. 


4(1+4,) is the ordinary probability integral at z. 
Note that if « is less than p, ie. z is negative, (1 — a,) must be used instead 
f 4(1+4a,) and the tetrachoric functions of even order must be taken of opposite 
sign to those for positive z such as are given in the tables. The odd order functions 
are the same for positive and negative z: 
T2x(Z) = — Tos(— 2),  Tos+1 (2) = Toe41 (— 2). 

Obviously we could get the area of any portion of the curve between «= a, and 

« = # by subtracting two expressions like (xiv) for z, and z. 


l reco AR 
The general expression for Tip) -1s 
wr 1 eee ‘eam v51 
Be EE : 
Vip) vp p 3 p Vp 4 
Pe. v6! 1 v7! pt+3 
th 5 Tp 6 C5 )n 





1 V8! (7p+12 1 V9! (47p + 60 
toa ti "tage of” 
a Pp Pp ¢ 
1 V10!{p? , 19 1 ¥V111(5 15: 
pee “ 2 — 1 =e 1 2 
tm Oo yet 2004 | aw pvp 10 (36? * +179? +1 
1 V12!(341 , 341 i 
+3 Ti "{yaao! ag0P thy ™ 


Pt+5 


1 413! p> . 4938 , 3349 
pvp 12 


162 + 1440?" + 9599? t 1p tt +> 








166 


and 
ik 


vol 
iD 3V4" 


wP— e-% 4 
C= i 
Fp) dw =}$(1+a,)— 

1 6! i 
~ pvp sve” p evi 
1 V9! {ee + 01 1 10! (p 
Psv9l 60 p’Vp 9 10 \18 
1 vii! (5 
P* 10 V11 (36 

_1 V18! ( p 
P’ 12V13 (162 
_ 8164,9658 
Vp = p 
2°1908,9023 1:-4907,1198 
= 
p Vp Pp 


°8451,5425 
% ~(ipt+1 


i 


oy ee +1 : 
PY 140 P ily 


4.93 3349 ) 


1:2247 44.87 
os T 








(p+3) 7; 


‘ ( 
ee *4183,3001 (47) 
pvp p* 
*3718,4890 
118,41 — (10p? + 171p+ 180) 7,— 
Pp ip 
S 
‘0569, 143 (osgzn? + 122 
pvp 


°0201,0408 
y~ 


(ii) Laplacian Form of Expansion. 


ee 
P4av5 * 


3 ] 
ns “) 7. p/p 7/8 


1511,8579 
p 


276p + 10080) 7, 


(560p* + 31059p? + 120564p + 


On Expansions in Tetrachoric Functions 


! 


Vs! 





(Pe) “i 


19 
50 P +1} T, 
341 


v12! { 341 1 
P+ sggP thy tm 


11 V12 (1440 


1440” t 9520? +1) ™ 


>+ 60) Ts: 


(175p? + 1377p + 1260) 7, 


90720) Ty. — 


This is an expansion with regard to the mode or maximum ordinate as origin. 


gP—) e~# 
The mode of y= ; = : 
‘ I (p) 


in the form 


a ee 


ye 
where p’=(p— 1). 


Let «=p’+ &, ie. 
 (— D) so that 


take the mode as origin. 


-(p'+&) 


(p'+&)e 


P(p' +1) “— 


where D= 
d 


is= 
; ant ~ Vp 


The introduction of Vp’ in the denominator 


is at e=(p—1), so that it will be easier to deal with y 


P(p +1)’ 


Then as before we require to find 


thd on te tere be (xvi), 


simplifies the integration a little. 








JAMES HENDERSON 167 


Proceeding as before : 


- e% (p' + €y'¢ e~(p'+8) 
J iat T(y’ +1) 


e~ $€*/p’ 


de=[ ob (- Dy ae. 
7” V2Qarp’ 








; aE stelle — 
i.e. I roy |. eNi= (—D) de, 
e~p'e ad te 
7 (l-oyph~ Pe HEC D)y VO 
and eve] — ia = $(0Vp’) v aise dz 
N20 
- —3(2—0n/p'? 
= PE kp’e? ge eee 
= $ Ovpryewef mae dz 
_ =o (@ v, p) e? ip’ 
therefore p (OV p') = e~PO-be'® (1 — 6- WEF iho ybe S (xvii). 
Now if ¢(—D) =q-—¢D+0¢.D?+...+(—1)*¢,D* + 


(0 Vp') = 6) +0, (OVp') + (OVP P+... +O, (OVPY + «. 
=O%+C ‘0 + Cy ‘O24 eee + 65/08 + 
where : c= ¢,(Vp')® or cy= 6, (V py, 
e-vpe -}p'& el = 0)-(e"4 Do e— P'0—kp'e —(p' +1) log (1—@) 
= e~Pe-1p'@+(p' +1) (0+5+5+...) 


WD g, Wt) », 


l 
= eft: 40+ i 


=¢,+¢,6 + c,6?+..., 


where : q®=1, ¢ =1, ¢ =1, ¢/=} a +3= " -, 
and generally by differentiating 

"4 aye - 

Cy =Co1 t+— Ces, 
s 
or Ce (Vp')s aan Ca (Vp - 1 : Cp (Vp Fs . 
1 1 ne 

thus C= — \e sa + - ne stirekinee tuaeagereuiaee sigee (xviii), 

\/ 
where GO=1, q= r. , o= : 

Vp’ p 
Therefore 
(p’ fe Eye (n't ) e-é 8 
= =!r—e », D2 = { 
Pp +1) {ey — ¢,D + ¢,D® — ... (— 1) eg 1 here 


7) = {ey + V2! gt, + VB! cots t+... +V(8 +1)! Cot egs + ---] 
~ Wp 





168 On Expansions in Tetrachoric Functions 


To find the area up to abscissa # we have 


[ (p' + E)? e7 (2+) le 1 
-y V(pt+l) °° vp 





s {Cyt + V2! cte+... + V(s +1)! ecteu1 + ...} dé 


=|" foot, + V2! ete +... +V(8+1)! Cet e4, + «..} de 


=}(1+4a4,)—¢7- V2! oT —V3!e37;—...—V8! C,T,— --. a& G=l, 
, 7. we F 
1 , , , 
1.€. =, dw =4(1 +.4,) —a, 7, — aT, — A;'T; — «0. — OT, — 
0 T (p + 1) 
where a, =c,Vs!. 
Substituting in (xviii) to obtain the difference equation for the a’s we have 


a’, 1 f Se , \, 


vst Vp W(s—1)! & V(s—3) 








3 ee 
sapefor ‘ =_ a , / . C , ) PS 
arlene mee Vp Vs 848 + V(s—1)(8—2) @g4} ..........00008 (xix), 
; ee , v2 
and a, =1, a/ = —, a=. 
Vp P 


By this formula the a’s are readily obtained numerically. It is to be noted 
that in this case the terms in 7, and 7, do not vanish, as they did in the expansion 
, 

- 2 » . » . ° “— 
from the mean, The argument of }(1+a,) and of the tetrachoric functions is P 


, 
, 


; ; i P 
and the remarks with regard to sign made above must be again observed. 
Coefficients in the expansion from the mode: 
a , 21 
=i, 4, = =» & =—, 
Vp P 
/Q4 Y ch 9 [51 , 
a, = =— =) \P" bs | > ay Laas pi sa wi ’ tig ceaeenes {=e 3 ’ 
p ily ] 3 a Pp- 12 j igs 60 
Preis. 6: fp? 2 - V7! (5 53 ) 
a = [18+ 3? ‘a ut, oe ey 136? + aap? ss 
io 1 ( 341 341 ) V9! {p® 493 3349 ) 
o — — m2 2 ue we = ep 
m= ia toggP tlh, a pip’ e + 1440?" + 9599 + 1) > 


1 f : ge 1)(s— 2) } 
/_. —— ls 3 — a iS 
a, vp’ \ 8 s4+ r a os 


We note that the coefficients of powers of @ in the functions 


cage 
2 at he 


and ¢'(6)= ftir eo 63+- 


(in the expansion from the mode we had p’ for p in ¢$’ (0)) are closely related. 














JAMES HENDERSON 169 


Then if c, is the coefficient of @" in ¢ (@) and c,’ is the coefficient of 6” in ¢’ (8), 


} , 
Cn =~ C n-3- 
n 


(4) In the last expansion it might seem possible to get rid of the terms in 
rT, and r, by breaking away from Laplace and expanding with regard to e~#/4 in- 
stead of e~*/”’; then choose q to give us the desired result. In Laplace’s form of 

du . 
the modal expansion the exponential term is é (aes mo”, where u=logy and 


2 


du du 
(sa) means the value of —— at the mode. 
dax* My da* 


a e-* 
Now y= Pret)’ 
u=log. y=p’ log, « —«—log,  (p' +1), 
du _P 4 
dx « ° 
ee 
dz =a?’ 
therefore (32) =— i — & ' 
dx*) yode p- Pp 
. a em e~ 24 sie a d al é 
If T(p'+ Tia D) VOmrq where ia and z a we have to find q, so 


that either the 7, or tT, term or both will vanish. 
By proceeding as before equation (xvii) becomes 
 (8Vq) = eve (1 — 8-9 
ae e7P'0—190°— (p’ +1) log (1-8) | 

The term in 7, will vanish if g=p’ +1 which is the square of the standard- 
deviation from the mean, but 7, will still be left. However, it does not seem 
likely that any advantage will be gained by departing from Laplace’s form of the 
exponential term. 

Having found the two expansions from the mean and the mode respectively we 
shall now proceed to examine the behaviour of the series by numerical calculation, 
but before doing so we shall endeavour to find a similar series for the Incomplete 
B-function. 


(5) To expand | 


the mean. 


% yPp-h qd oa x) 


0 B (p, q) 


dx in terms of tetrachoric functions about 


The mean is at x= p/(p+q). 

Y= 
(p +4q) Vp+qtl 
Take origin at the mean; then «=p/(p+q)+& Let 


The standard deviation is o= 


ge (1 «i: a) a Minded 
B(p, q) = b (- D) Jtee wage: gee eee eben 0d ee bis aie (xx), 
where , d y= E 


~ dy’ c 








170 On Eapansions in Tetrachoric Functions 


As in the case of the Incomplete I'-function multiply each side by e and inte- 
grate. The limits of the integral on the left-hand side will be « = 0 and «=1, as 
we take the value of the integral outside these limits to be zero. 


The & limits will therefore be —p/(p+q) for «=0 and q/(p+q) for «=1. 
Then 











q/p+4 H(E+ pp + OP? y 2 
+ +q))\*71d f 6g D = dk, 
| eee B(p, q) ~{¢ P/(p q) d&= e $(- oe dé 
: 9 (2 PKO+D) gP— (1- a has ~hy 
t d. (oy 6(— D) —— d 
43 3 B(p, 9) om] eb DY dy 
eo? 11 _ x) 2 ¢lOa)y—ty? 
—6p/p+ ait er 
and ——e=®p of og $ (60) ae 
6 —}(y-00)?+40o? 
= (00) [ay 
eo wa ‘iil OC CURR a (xxi). 
Now 
f gP7 (1 _ ag)I et 
0 B(p, q) 
_ fern aye | Ot Oe, OL Na 
=. B(p,9) ii + Ox + + gy tet rt} de 
_ By, q) B(p+1,q) , @ Bip +2, q) & B(p+s, 9) 
~ B(p, q) B(p,q) — 2! Bip,qg) °"""s! Bip) 
But B(p+sq)_ sp (p +1)... (p+s—I) 





B(p,q) (p+ q)(pt qt)... (ptqts—l)’ 
1 @P-* (1 al x) ef Pp e p (p+ 1) 
refore dz=1+0 
therefore [ re v + Pang iseutech 
& _—sp(p+i1)...(p+s—1) 


Fo ¥ 51 (pt qg)(ptqtl)-.. (peqts—1) 


From equation (xxi) 





= er p @ p(p+ 1) 
O@c)=e Pta © 1+0 1 
(Ac) =¢ “t + p+q. 2! (p+q)(p+qt]) 


0 p(p+1)...(p+s—1) J os 
+ s!(p+q)(p+qtl1).. (p+qts—l)” oosaA KKH). 
Let rae ren igs ios 1 a,D°+..., 


(Oc) =a, + a, (Oc) + as (Oc)? + a; (Oc)? +... +4,(Oc) +... 
= +¢0+¢6.0 + ...+¢60+..., 
where C, = 4,0". 
By equating coefficients of powers of @ in equation (xxii) the coefficients in 
¢ (Ac) can be obtained in terms of p and gq, for 


ean Ae 
~ (pt+qP(p+qtl) 





Sere 











JAMES HENDERSON 171 


Obviously 
®=1, 
4 =—pl(p+q)+pl(p+q)=9, 
ea SF ee p 





21(pt+qy 2(p+ge(ptgtl) 2! (ptq(ptqtl (ptq? 





Haas ss q 
p+q and pt+q (p+q)(ptqtl) 





“sterg 
“1 osadsre-oxnderea 
p+q) Cer ee sy (p+q)(p+q+1) 


‘ 9 


Similarly the other c’s can be determined but the work becomes more and more 
laborious as we go on. 

Unfortunately, as far as the numerical work is concerned, we have failed after 
many attempts to find a relation connecting successive c’s, similar to that found 
in the case of the Incomplete [-function. At first it was thought that the 
following treatmeuat would facilitate the numerical calculation of these coefficients. 


- _p —49292 
Let e Pra = b+ dO 4D. +... 0,0" +..., 
then — p/(p + q) 8 — $0°F = log, {by + b,0 + 6 +... +b, +...}. 
Differentiate this and then equate coefficients of powers of 0: 


(by + 0.0 + b.6? +... +b, + ...)(—p/(p + q) — 06) 
= b, + 2b,0 + 3b,0° + ... +8b,0° + 
Equate coefficients of 6°: 
sb, to P/|(p 7. q) Dy =e bss; 
isla 2 2 ii 
therefore b, - Fz _ bite b.| Rem Tee le i ciey ae ead (xxiii). 
This formula enables us to calculate the b’s very rapidly on the machine when 
p/(p +q) and o* have been determined. 
From equation (xxii) 
Cy + 6,0 + 6.02 + ... + 6.0% + ... = (by + 6,0 + 0,62 +...) 
.- e Ce (p+1) 
1+0_£. +— _2Y ES 
Te pt 2 (ptg(ptqth** 
Equate coefficients of @: 








- oa p(p+1)...(p+s--1) 
"8! (p+q)(ptqtl)...(p+qts—D 
- p(p +1)... (p+s—2) = 
*@-Iy @ia@ptetl@toterd* «ton gag 
i.e. ¢,= 2 b, ] p(p+1)...(p+s—r—1) 





(s—r)!(p+q)(pt+q4l)...(p+q+s—r—]) 








172 On Expansions in Tetrachoric Functions 
The 6's, having been caleulateu previously by (xxiii), this last formula gives a 
fairly rapid way of calculating the c’s, at least the earlier c’s. Then 


a,= = 3d, } p(pt+1)...(p+s—-—r—1) 
a yao (8-7)! (—p+q)(ptgtl)...(ptqts—r— 





1) (a) = 1) ...(xxiv). 


x 
What we require generally is the area represented by [ a (1—a2)" da: 


faa), e- ite? 
0 


D)— _ 
B(p, q) -[". as Fr E 
y D ne 
sae ee a ) Tia Yy; 

Le. I. B(p,9) 

y e~ ty? e~iy* e- Ww? -v" ) 
= » = —a,D —-— + 4a,)* ~ 1 DO —...+a 

[je V20 " —_—" --(-19 Dar j y 


VO 
y -hy? +y° ty" —hy’ |y 
=| a, — dy — a, |". os it <8 it ony ee G 1)‘ ° =| CEB 
-« NVQ 


Qa — 2 
=4(1+a)-—a, V1! 7,—a.V2! 7, —a;,V3! eee ™—.. (= 





=4$(1 +4) — a7, — ay Tt, —0,'T; — ... — OT, — oe, 
where a,’ =a, Vs! 
i<s 1 1 -7— 
Then a, = vs! > bd ___ pet )...(p+s—r | = (xxv). 
ao 230 


"(s—r)! (p+q)(ptq4l)...(ptqts—r-l)" 

Now c, and ¢, are equal to zero, so that a,’, a,’ are zero. Thus there are no 
terms in 7, and t,. The argument of the tetrachoric functions and of $(1 +) is y, 
E _ «—p/(p+q) 


which is equal to 
o 


. On applying the above formula for a,’, we were 


greatly iad to find, that with the b’s to 8 decimal places the expression 
under the summation sign in the examples used commenced with 4 or 5 zeros 


: . ~ r\’ , ‘ 1 , : 
after the decimal point. As Vs! and (=) both increase with s (- being in our 
. o Cc 
case > 1) accuracy to the seventh place in our a’s could not be obtained. Accord- 


ingly the formula actually used was of a different type. 
a (1 — a) 


B(p, q) 


where the argument of the tetrachoric function is again 


io a] 
Let = S¢,7,, 
1 


E_a—pi(p+9) 
o o ' 

Multiply both sides by 7,, weighting by the factor e#*/**, and integrate from 
—2 to +o, the left-hand side being taken as zero outside «= 0 and #=1. 


qipt+a gP—-(] — g)I" he = ps 2/¢2 
Then | wat ed i 7, hie dé = | T,S cst, ef ''? dé. 
-plp+4 B (p, 9) -~ 1 








JAMES HENDERSON 173 


Since | T,Ty e/0* dE =0 only the term in 7, will be left on the right-hand 
side, i.e. 


| Ts S CsTs ebf'/o* dé = Cz | t ei&/o* dé 
-@ 1 —-2 
Putting E/o = y; dé = ody, 

WUP+Q gP-l el —«) : a 

we have | ——") _ ¢, ehftlot dé — c| oui 
-rietg = Bp, ) f _v ody 

Cy 
sV2Qqr’ 


1 »pp- — »\a- x—-plp+q)\2 
| a? (1 w)4 Pe . ) em 
o 0 B(p, 7) 


a2iae 1 [ {(e-ekeray ees (*Hele+ a 


o WVsiVor o 2.1! > 


4 @=D(s—2)(s—3) (8-4) (" —pi(p+ Dy ‘ a aie. 





0 


23.21 o B(p, 9) 
Pe [’ \(° — Pp + = _ (8-1) (s— 2) (” — Pp + 2 
aVs!.o o 2.1! o 
+ €—- DE@—2)(e—3)(6~ 4) c — Pp + a a we? (1 — a) 
23.21 o “) Bip, gq) 
iniwbeeuneameonee (xxvl) 


The integral for any particular value of s reduces to a series of B-functions and 
so c, is found. 


The area up to abscissa # is generally required : 





ie. , eee 
J0 B (p, q) 
‘ ev oP! ~— g)I! € a) 
But i wii Ni de = | S(c,7,) dé 
0 B(p, 9) sey 


= o|’ s (c,7,) dy. 
1 


—@® 


ry 
Now | T,dy=— 2 
Jy -@ 


vs 


ice Sot. de ee ae eS 
I, B (p, 9) dene] andy—¢ ja 


7 ; + gS oa! _ + 
Sg —= To + Cy = Te +. + C, = To— its 
a a 
y e-ty? 
=e, | 7— dy — 0,7, — AgTz— --. —AyTe— -»-, 
J —0 V2Qer 


where a, = Cs41 =, =o0¢,$(1+4)—a,7,—a,7,—4;7;—... 


Ae 
Vs+1 











174 On Expansions in Tetrachoric Functions 


If we put s=1, s= 2, s=3 in the above formula (xxvi) for ¢,, 


Q=-, 
oe 


= 


2 . (= Pip +q)\ «7 A — a) es 
0 


oV2! o B(p, 9g) 
2 cs | 
=~ g — {[B(p+1,9)- B(p, 
=2Vai Bip, g) (Bet) )—Pilpta) BO, a} 
=0, 
3 ae 2.1 a? (1 — x) 
wee \, E 5 {a — 2ap[(p +9) + P'(p+qy} — > | BG.9 


3 1 Pp 
oe | es B(p +1, 
= lao ill (p+2,q p+ i q) 


B(p, q) — o° B (p, | 


+ore 
3 ‘ p(p+1) x a Pp Sa 
~ ot V6 (ptg(ptqtl) (ptg? (p+q" (p+ ar (ptqt)) 
= 0, 


as obtained before .by the other method. 


The terms in 7, and 7, do not exist, so that the expansion becomes : 








[ ee dx =4(1 +a) — G37; — QT), — ... —A,T,—..-; 
where a,= Ia + Ceti 
a2. . Bt») ,f fe-eiet ey _se-) (--eero 
~~ s4+1 Term hl o i ( o ) 
426s —1)(s-2)(s—3) (“= —p|(p +q)\*"* ae (1 — gi em 
- 22.2! o a psi B(p, q) oi 
a ey 4¢ — P/(p+ vY _ 8(s—1) (" — Pl(p + a" 
al o 2.1! o 
pers el H moana eee 
.2! c a B (p, q) 
(essen (xxvii). 


+ 
The argument for the tetrachoric functions and for $(1 +a) is - ~ Bhp 2. If 


this is negative then we must take $(1 — a). 

From the above expression (xxvii) the coefficients of the expansion can be 
determined both algebraically and numerically, but for the higher coefficients the 
algebraic work becomes exceedingly heavy. It is to be remembered that 


_ Pq 
~ (p+qr(ptq4tl) 























JAMES HENDERSON 
Ss se =m; th 5 
uppose (p+q)=m; then o wach’ 


The coefficients a,, d:,... etc. are given below: 


_2 
ee ae 1 a (m — 2p) 
pq (m+2)’ 











1 m? (m +1) } 
_—— ee Lal 
V4! sh (m + 2) (m+ 3){ pq (5m + 6), 
war a /m+1 (m — 2p) fm? (m+1)_/ ; 
~= VB! Aha pq (m+2)(m+3)(m+4) (| pg i lili 
. B: 5 1 ms (m + 1) 
V6! FRY I) AY GHA 5)| op’? 


——+(m? — 32m — 60) — 3 (2m? — 41m? — 154m —- 1 20)}, 





ay = = 6 | i met (m=2p)" 
ar PY (m+ 2 2)( )(m + +@) )(m +4)(m +5)(m+ - 6) 
pe +1 m(m+1) 
x a 
Pp? 12pq 





(7m + 15) (m — 20) 


— + {7m — 59m? — 342m — 360)! . 
1 1 


V8! (m+2) (m + 3) (m+ 4) (m + 5) (m+ 6)(m +7) 
ae es +1) | mi(m+1?, 


ag = 








Se 2 _ — 210 
pq? 60p'g? {47m? — 853m 0} 


m? (m+ 1) ; 


mg i {— 251m* + 1503m* + 9974m + 10920} 
30pq 


+ gh; {1271 m! — 1697 m* — 44512m? — 104364m — 65520) | 


An additional coefficient a, was calculated for one of our examples, but it was not 
considered worth while working it out algebraically. 


The coefficients in the tetrachoric expansion obtained by this latter method, 
that is, by using the property of tetrachoric functions as semi-orthogonal functions, 
are identical with those obtained from the first method, which consisted in equating 
moments of the functions on both sides of the equation. Thus we are led to the 
same expansion in both cases. 

(6) The numerical results are certainly interesting but from the utility point of 

-view they are not very satisfactory. Tables I—VIII contain these results in a 
convenient form; the values of the coefficients a, and a,’, the tetrachoric functions, 
the successive terms (— a,7, and —a,’7,;) and the values of the series up to the term 
containing 7, are given. It is to be noted that the coefficients do not appear in 

Tables II and IV but as these are the same as in Tables I and III respectively it 





176 On Expansions in Tetrachoric Functions 


was not necessary to repeat them. In all these tables, in the row s=0 we have 
placed 4(1—a) in the column containing the tetrachoric functions and it is only 
necessary to draw attention to the fact that in the next column the negative sign 
in —a,r, does not apply to the first term }(1—a). The tables will then be easily 











TABLE I. 
29'4 gt8g—a ‘i 
I, Ta) zZ 2°8 *. 

: —_- 

‘ : Tetrachoric Terms in | Value of Series | 

3 Functions r, Series —a,7r, | uptotermr, | 

Se ee IT es cena —s neice — 

0 1°00000000 + 0025551 +°0025551 | *0025551 | 

1 0-00000000 + 00791545 = | 

| @ 000000000 — 01567180 — — 

| 3 0°11664237 + 02210325 — 0025782 — ‘0000231 

| 4 0°02499479 — 02189644 + °0005473 0005242 ~—s| 

5 0:00638743 + °01259137 — ‘0000804 “0004438 | 

6 0:03228531 + 00159776 — ‘0000516 “0003922 | 
r 0°01785148 — 01140536 | + 0002036 “0005958 
8 0°00840223 +°01000967 | — 0000841 “0005117 
9 0°01470566 + °00006659 | —-0000010 *0005107 
10 001282194 - ‘00849985 + ‘0001090 ‘0006197 
11 0:00895618 + °00711870 — ‘0000638 | “0005560 
12 0°01042333 + ‘00164419 —'OO00171 | ‘0005388 
13 0°01079260 — 00754632 | + °0000814 0006203 
14 0:00962776 + 00418464 | - 0000403 “0005800 
15 0°01015854 + *00374438 — ‘0000380 “0005419 
16 0°01102777 — 00640271 | + 0000706 “0006125 
17 0°01128126 + °00094254 | — -Q000106 ‘0006019 
18 0°01209893 + 00523425 | —-*0000633 “0005386 
19 0°01345974 — 00422873 + ‘0000569 ‘0005955 
20 001483350 — (00218561 + 0000324 ‘0006279 
21 0°01660082 + 00525599 ~ 0000873 | "0005407 
22 0:01901932 — ‘00110396 + °0000210 0005617 
23 0°02196131 — 00426227 + ‘0000936 ‘0006553 
24 0°02561864 + 00346981 — ‘0000889 ‘0005664 
25 0°03033429 + *00205905 — 0000625 “0005039 
26 0:03631783 — 00439701 - ‘0001597 0006636 
27 0:04391748 + 00042653 — ‘0000187 “0006449 
28 0°05371111 + ‘00393217 ~ 0002112 “0004337 
29 0°06638572 — ‘00244866 + 0001626 | “0005963 
30 0°08285325 — ‘00248099 + 0002056 | © ‘0008018 


True value 0005850, 


understood, but, in order that a better appreciation of the results may be obtained, 
the value of the series up to a certain term has been plotted against the number 
of that term. A line, drawn across the paper and corresponding to the true value 
of the integral, shows how much the value of the series is in excess or defect of the 
true value of the integral. The various points have been joined by continuous 
wavy lines but, of course, these lines have no real physical meaning. However, by 
joining the points, the graph will, we think, convey a better idea of the variation 


Las 








we FP wes 2 














JAMES HENDERSON 177 


of the values of the series than a set of isolated points would. Figures 1—7 corre- 
spond to the data given in Tables I—VII. 


Now in the case of the Incomplete T’-function we obtained two expansions, with 
respect to the mean and the mode respectively, and the graphs tell us which of 
these two gives us the better approximation. Figs. 1 and 3 (‘Tables I and III) 

ape ‘ ; ‘ [29'S gA8 ea 
show the variations in the values of the series for 
9 (49) 
and the mode respectively, while Figs, 2 and 4 give us similar information for 
P42 Wy A8 ga 


J) P(49) de. 


dx from the mean 


-00400 





‘003004 
-00200/ 


‘00100 


i 








00000H \.o” 














-00100 SR ASE RE A A Re SRR LO MS (SO, RR RIES OE Fe Le Pe | 
‘s(-a) Ts % % % T % % To Ty Te Tis Tia Tis Te “iz Tis Tig Teo Toi Tee Tes “oa Tes oe Ter Tos Tog Tyo 
NUMBER OF TERMS 
Fig. 1. 





16100 





‘16000 


15500} 
p--- D>, 

: 0. igi , 3 
“158004: 6." a8 Oo. “On, oO, . 





: . 
J TRUEDVALUE 
Q 


RO) 


157001 i 


“156004 














15500 I I I Sill i i I 1 i I I lL i tes eee Ae 1 l 1 eee 1 J BE i 
eli-a)Ts % Ts 1g Ty Tg % To Ty Tie Ts Ta Ts Te Tr Tig Tg Too Ta Tee Tes Teq Tes Tee 127 bop eg Too 
NUMBER OF TERMS 


Fig. 2. 





; a a . a ; 

It will be seen that in Fig. 1 the points are much closer to the ‘true value 
line than in Fig. 3 (and similarly in Fig. 2 they are closer than in Fig. 4) so that 
the expansion from the mean seems to give a better approximation than that 
from the mode and it has the additional advantage that the terms in 7, and 7, are 
missing. Besides, it seems more natural to expand these normal curve functions in 
terms of the mean and standard deviation. For comparison purposes the graphs are 
all on the same scale. The graphs for the mode and the mean behave in a very 


Biometrika x1v 12 


























































On Expansions in Tetrachoric Functions 
42 x 
TABLE [| ode, xu = 1°. 
o (49) 
| Tetrachoric Terms in the | Value of Series 
8 | Functions 7, Series —a,r, | up to term 7, | 
ae ge abt | 
0 | + *1586553 + 1586553 "1586553 
1 | +4°24197074 “0000000 = 
2 | —-17109916 0000000 es | 
3 | — 00000000 | “0000000 oe 
4 +°09878417 | -—-0024691 | ‘1561862 
5 — ‘04417762 | +-0002822 “1564684 
6 | 05410632 ++0017468 "1582152 | 
7 | +705453404 — 0009735 ‘1572417 
8 | +°02410087 | —--0002025 ‘1570392 | 
9 — 05302190 | +-0007797 "1578189 
10 — 00355664 | + -0000456 "1578645 
11 + 04657133 — 0004172 "1574474 
| 12 | —-01034833 + 0001079 *1575553 
| tt ~ 03814548 + 0004117 ‘1579669 
14 + 01939964 —+1001868 “1577802 
15 +. ‘02921077 — 0002967 "1574834 
16 — +02483411 + -0002739 1577573 — | 
17 — 02054429 + °0002318 “1579891 
18 + ‘02755708 — ‘0003334 “1576556 
19 | +:°01256341 — ‘0001691 *1574865 
20 — ‘02825493 + 0004191 *1579057 
21 ~ 00548187 + 0000910 “1579967 
22 + 02745951 — 0005223 "1574744 
23 — 00060803 +°0000134 | 1574878 | 
24 — (02558848 + 0006555 | 1581433 t 
25 + 00568862 —-0001726 | 1579707 | i 
26 + 02297227 —-0008343 | 1571364 j 
27 — ‘00978859 + 0004299 | 1575663 
28 — ‘01987296 +:0010674 | 1586337 
29 + 01296514 — 0008607 | ‘1577730 | 
30 + ‘01649808 — ‘0013669 | 1564061. | 
True value 1577387. 
-00400 
D 
‘003001; : 
rk 68 
00200} “9 
-00100H = -O. £ 
E a fo, (Onn eth ma yo o- wo 2. Fos “ TRUE VALUE | 
eS ew og te : eee —— “Ov ag 
‘000004 i 4 
--00100 H ad 
--00200 A ar a ES OE TE ae Be RE, ae sR A tn OE ce 














ai 
s(l-a)T Te Ts ty Ts 1 T Ty Ty To Ty Te Ts Ty hs Ne Tz Tis Tig Teo Te, Tez Tez T24 Tos Tes Te7 Teg Teg T30 
NUMBER OF TERMS 
Fig. 3. 
t- 42 —49 
s= a = = = §, 


‘7 


* 








IR 


aD 














J AMES: HENDERSON 179 


similar manner; for, if we regard the graphs as a wave, it will be noticed that at 
first the amplitude of the wave is big, decreases gradually up to a term in the 
neighbourhood of + and thereafter increases more and more rapidly. This can be 
explained fairly easily ; as s increases the tetrachoric functions 7, do not increase or 
decrease steadily but vary in sign and remain of the same order of magnitude. The 
coefficients a, vary in much the same way (except that they are ali positive) up to 
a certain point and then begin to increase very fast. In equation (xv) we had 











s — 
a / {/sa, + V(s— 1) ay_2} 
8+1 N Pp (s i 1 ) t ‘8 ( ) 8—25> 
; hie /s ; ‘ 
1.€. G4, 18 of order a/ — {d, + Mss}, 80 that as s increases there comes a time when 
p 
~ : fl sect : ‘ 
Vs overcomes the reducing effect of -= and then the coefficients will continually 
Vp 
4-19324 
16300 H | 
‘162004 & 
16100 H 
16000H : : | Fee | 
ae a i o. 4 
15900} ! om fa 
) = eo a 
aed 2g 8 Fics! OA ae i AS aa 
"I5800/) | : H ees ee ES sou 2 - : TRUE ‘VALUE | 
3. ee) Mes See ee ee Bike 
Went : | YG = So” bs , 
aro. ov = a 
“5600H] } ' i 9 Xo! 
7. ‘o-” 
*15S500}4 : 
15400} }! , 
fo) 
15300 —— eee LL ee 














b(l-a)% Te % % % I % Ty % To Ty Te Ts Ta Ts Te Tr Tis Ty Lo te Tee Tes Tes Tes Be Te7 bea Teg T30 
NUMBER OF TERMS 
Fig. 4. 
increase. For higher values of p this turning point will not be arrived at so soon 
and the points will hang closer to the “true value’ line for a greater number of 
terms, but it does not seem likely that the values of the series will tend to a definite 
limit. The equation for the modal expansion coefficients is a similar one and these 
coefficients behave in the same way. 
Turning our attention to the expansions from the mean, Fig. 1 (and Fig. 3 to 
a less extent) would seem to suggest that the tetrachoric series gives quite a good 
approximation to the value of the integral. Although some of the points are very 
12—2 








180 On Expansions in Tetrachoric Functions 


TABLE III. 


& a8®e-% 


r(49) dx, z=—2°6846788*. 
- 0 e 




















’ Tetrachoric Terms in Value of Series 
- as Functions r, Series —a,’r, up to term r, 
1 @ 100000000 + *0036296 + °0036296 0036296 
1 0°14433757 + ‘01085979 — ‘0015675 -0020621 
2 0°02946278 -- 02061573 + ‘0006074 0026695 | 
3 0°12521683 + 02752089 — 0034461 — ‘0007766 
4 0°06166251 — 02503988 + 0015440 ‘0007675 
5 0°02648957 + ‘01160193 — ‘0003073 “0004601 * 
6 0°04236301 + *00557065 — ‘0002360 “0002241 
7 0°03460284 — ‘01460370 + *0005053 “0007295 
8 0°02288715 + *00939504 — ‘0002150 *0005144 
9 0°02516285 + °00363988 — ‘0000916 -0004229 
10 002488684 — ‘01101274 + ‘0002741 “0006969 
ll 0°02136289 + *00579095 — ‘0001237 “0005732 
12 0°02167770 ++ ‘00509737 — ‘0001105 ‘0004627 
13 0°02272772 — (00713419 + 0001621 “0006249 
l4 0°02256727 +- 00058475 — ‘0000132 *0006117 | 
15 0702351439 + 00599464 — ‘0001410 “0004707 | 
16 0°02546065 — '00455186 + 0001158 *OO005865 | 
17 0°02739094 — *00248832 + 0000682 ‘0006547 
18 0°02996700 + 00573797 — ‘0001720 *0004827 | 
19 0°03360181 — 00124666 + 0000419 *0005246 | 
20 0°03803862 — 00454994 + °0001731 *0006977 | 
21 0°04355963 + 00382135 — ‘0001665 *0005312 
22 0°05068120 + *00204641 — ‘0001037 ‘0004275 
23 0°05968962 — 00471305 + ‘0002813 “0007089 
24 0:07107603 + ‘00066657 — ‘0000474 ‘0006615 
25 0:08566837 + °00406751 — ‘0003485 -0003130 | 
26 0°10443752 — 00276906 + 0002892 | “0006022 
27 0°12866399 — ‘00240728 +- 0003097 ‘0009119 | 
28 0°16018283 + *00383980 — ‘0006151 *0002969 
29 0°20147298 + ‘00036667 — ‘0000739 | *0002230 
30 0°25589543 — ‘00382481 +:0009788 | *0012017 








True value ‘0005850, 


near to the ‘true value’ line, the approximation is not really a good one. The 
important question for us is: To how many decimal places does the series give the 
result correct ? On going through the tables it will be found that there is no value 
of the series up to the sth term giving the result correct to more than three or four 
places. We now come to the real trouble. Suppose a frequency function is expanded 
in tetrachoric series, how are we to know at what term to stop so as to obtain the 
most accurate result? If the value of an integral is required, the true value is 
wanted. In our work we chose integrals of which the value was already known. 
From Figs. 1—4 it is easily seen that we have as good an approximation at the 


* -%-P _ 34-48 _ 


{= S — 2 2°6846788. 
Vp’ 48 











TABLE IV. 


42 yA8 ev 


» T(49) . 


v, 


JAMES HENDERSON 


z=—"8660254*. 





Tetrachoric 
Functions 7, 





Terms in 
Series —a,'r, 


| Value of Series 
up to term rT, 











TABLE V. I ais-By 











0 "1932381 "1932381 ‘1932581 
1 + 27418875 — *0395757 "1536624 
2 — 16790564 + 0049470 "1586093 
3 — °02798427 + 0035041 "1621134 
4 + *10905792 — 0067248 "1553886 
5 — 02346554 + 0006216 *1560102 
6 | -—-07134833 + 0030225 "1590328 
7 + 04145828 — -0014346 “1575982 
8 | +°04451198 — ‘0010188 "1565794 
9 | —-04705083 + 0011839 *1577634 
0 — 02465039 + 70006135 "1583768 
+ 04681171 — 0010000 "1573768 

+ 00975248 — 0002114 "1571654 

— :04356976 + :0009902 "1581556 

+ 00140962 — 0000318 "1581238 

+ 03877058 — 0009117 “1572122 

— 00966795 + 0002462 "1574583 

— 03323150 + 0009102 “1583686 

®@ +-01562623 ~ 10004683 *1579003 
+ 02744360 — 0009222 "1569781 

— 01974338 + 0007510 *1577291 
~—-02171195 | + 70009458 ‘1586749 

+ 02237974 — 0011342 ‘1575407 

+ (01622878 — 0009687 ‘1565720 

— 02382475 + 0016934, "1582654 
—-01111122 + 0009519 "1592173 

+ 02431475 — 0025394 *1566779 

+ 00643169 — 0008275 "1558504 

~ 02404492 +- 0038516 *1597020 

— 00222729 + 0004487 “1601507 

+ 02317774 — 0059311 “1542196 

True value *1577387. 
5 gl4(] 4 
te de, yo S00. oath, 


oak 
q=5, 











~ .99449112— 


Tetrachoric Terms in Value of Series | 
as Functions 7, Series — a,r, up to term, | 
| -—- 
0 1-00000000_ | ‘0040751 “0040751 “0040751 
3 — ‘19638608 | +:°02950904 + 0057952 “0098703 
4 + *01452267 — 02602453 +- 0003780 0102482 
5 + *03818545 +- *OL099737 — ‘0004199 “0098282 
6 + *05515045 + ‘00712711 — ‘0003931 *0094352 
7 — +01389639 — ‘01561177 — 0002170 0092183 
8 “03609105 + 02031787 + °0007333 “0099516 
True value 0096054. 
, ae ‘5-15 
7 5 
= — 8660254, + y=—o - 





— 2°6457513, 
































182 On Expansions in Tetrachoric Functions 
TABLE VI. 
5 (1 — a) ; 
——_“ dx, y=-—1°3010412, =4, g=3, m=54*. 
[. B (4, 3) y Pp = 4 $ 
Tetrachoric Terms in Value of Series | 
. as | Functions 7, Series—a,r, | up to terms 7, | 
cacals aie viese ed 
0 100000000 ‘0966212 0966212 ‘0966212 
3 — ‘28327885 + °04839695 + *0137098 ‘1103310 | 
4 — ‘01400852 + °05941568 + *0008323 *1111633 
5 + ‘16688842 — 06703628 + °0111876 *1223509 
6 + *05349154 — ‘00778490 + 0004164 | *1227673 
7 — °05325140 + °05554783 + ‘0029580 | *1257253 
8 — 09445982 — 01930950 — ‘0018240 | 1239013 
9 — ‘00063525 — 03745046 — 0000238 | *1238775 
True value °1188790. 
TABLE VII. 
1g (1 — a)! 
—~—____“. dg = — 359087385 +. 
] » B48) y I 
| | ae ; | 
tout Tetrachoric | Terms in | Value of Series | 
: | s Functions 7, Series —ay,r, up to term rT, 
ae ee oe 
(8) 1-00000000 ‘0001648 | 0001648 0001648 | 
3 — ‘28327885 +-:00307042 | +:°0008698 ‘0010346 
4 — -01400852 — 00458580 —-0000642 | 0009704 | 
5 + ‘16688842 + °00530458 — ‘0008853 “0000851 | 
6 + ‘05349154 — ‘00442734 + ‘0002368 ‘0003219 | 
7 — ‘05325140 + *00191632 -+- ‘0001020 “0004239 | 
Pe — *09445982 + ‘00111687 + 0001055 | ‘0005294 
9 — *00063525 — ‘00291774 — ‘0000019 | 0005275 


True value *00023603. 


5th or 6th term as at the 15th, say, and better than at the 30th. Of course, one 
might calculate the various terms till the sums became more or less steady, take 
the mean of these sums after the steady stage is reached and use that as the value 
required. This process, however, will not give a greater accuracy than three or four 
decimal places correct and very likely the result will not be so good as that. Besides 
which it is difficult to give such an arbitrary weighting of terms a theoretical 
justification. Thus it seems that the tetrachoric series is not at all suitable for the 


representation of the Incomplete I’-function. 


Pp 


2- a 
* y= toe 
- o -*17468526— 
p 
ss etq i 1-f, _ 
ty ¢ ‘17468526 


= —1°3010412. 


— 3°59087385. 





JAMES HENDERSON 183 

When we consider the tables and graphs for the Incomplete B-function, the 
results are certainly no better than in the case of the Incomplete I'-function. 
Unfortunately, owing to the lack of a difference formula connecting the successive 
coefficients, we only calculated a few terms, but the behaviour of the graphs is 
similar to that of the graphs of the Incomplete T-function. Fig. 5 is very like 
Figs. 1—4 but Figs. 6 and 7 are rather different. In Fig. 5 the integral is 


| ao dx, where p is of high value and q is of moderate size. In Figs. 6 and 7 








Jo B(15, 5) 
’ . [ @&(l—a)t ~~ ' 
the integral is BG) dx, where the upper limits are ‘5 and ‘1 respectively. 
- 0 »g 


Here p is 4 and q is 3. It seems in the incomplete [- and B-functions that the 
points come nearer the ‘true value’ line for the tail of the integral than if the 
upper limit is near the mode. 















































12600H 
a, 0: 
spe -0100H gO, 
“125001 ie : O__TRUE yALUE 
| Oy 
“124001 + eo 
; 0080} : 
“12300H a 
om H 
-12200H Z 
-0060H: 
121004 ‘ 
12000} 
-0040}? 
“119gOO} TRUE_VALUE 
i 
“11800 
0020 SOR ET SS ES Ca 
“bITOOF Tz % Ts 1 T7 Te 
NUMBER OF TERMS 
“114600 7 : 
Fig. 5. 
-11500H 
-11400H ' 
“113O0O0H : 
“112001 00200 
4100H 00100} 7 
ioe a ; Pe 6B 
-LLOOOH - 00000} 
¢-09662 
wae | i I i i I I ee I I 








Tz Ty Ts Ig Tz Ig ‘Ty 
NUMBER OF TERMS 
Fig. 6. 














4e(i-a) Ts Ty Ts Te T Ty Ty 


NUMBER OF TERMS 
Fig. 7. 











184 On Expansions in Tetrachoric Functions 


49 48 pa “—p 49 —49 
s 7 Taq sre % a = = 
Table VIII gives the results for [ P(49) vp 7 


for the expansion from the mean, all the tetrachoric functions of even order vanish, 
It will be observed that the values of the series vary in a similar fashion to the 
others and not one of these gives the result correct to more than four decimal places. 


dx and, since z = 


0 


TABLE VIII. 


[" a® e-* 


qe P(49) da, z=0*, 


(Expansion with regard to the Mean.) 











| | | Tetrachoric Terms in Value of Series 
v4 as | Functions 7, Series —a,r, | up to term 7, 
Fabee| i wu kee Ae A 
0 | 1-00000000 | — 5000000 “5000000 “5000000 
‘ ‘11664237 — ‘1628675 + °0189973 ‘5189973 
5 ‘00638743 + *1092549 — 0006979 “5182994 
7 ‘01785148 — ‘0842920 + 0015047 “5198041 
9 ‘01470566 + ‘0695373 — (0010226 “5187815 
11 ‘00895618 — ‘05967 11 + ‘0005345 ‘5193160 
1: ‘01079260 | +°0525526 — 0005672 ‘56187488 
15 | ‘01015854 hea “0471442 -+- ‘0004789 *5192277 | 
hs | 
True value 5189993. 


After a careful study of the tables and graphs we are forced to the conclusion 
that a tetrachoric series is of no practical utility as a representation of skew 
frequency curves such as y= y#?—e~* and y = ya" (1 — a)", and although it 
may be rash to generalise from our results on these two types it would seem 
that such a series cannot be generally suitable to represent skew frequency dis- 
tributions. Moreover, the types, which have been discussed, are of common occur- 
rence and for these the expansion is certainly futile. 

The true values of the incomplete ['-function were taken from Tables of the 
Incomplete C'-function which will be shortly issued by H.M. Stationery Office. The 
values of the incomplete B-function were determined by direct calculation; the 
power of (1 — ) was expanded and the result readily obtained with the help of the 
relation 

Bip ya LEY) 
PD) = 
l(p+q) 

In his Vorlesungen tiber die Grundziige der mathematischen Statistik (Hamburg, 
1920) Charlier, when dealing with skew frequency curves, gives as the general 
equation for the skew frequency curves of his Type A 


t Y 7” hy 7 Bs dy” so Budo’ = Bd" + <0. 


« -%-p 49-49 — 


Vp 7 


0. 











JAMES HENDERSON 185 


1 é ; : ; 
where ¢, = a ~#” and ¢,”", do”, dy", ... are the third, fourth, fifth, etc. ditfer- 
T 
ential of coefficients go, i.e. Y is really expressed in a series of tetrachoric functions, or 


Y=5 {r, — By V4! 7, + B, V5! 75 — B; V6! 7 — ane 
B:, Bs, Bs, etc. along with M (the mean) and o Charlier calls the ‘ characteristics’ 
of the distribution curve. Now he seems to think that generally the coefficients 
dh, dhe ‘a 
da®’ dat for «= ‘00 
to 3:00 at intervals of ‘01 and also for «= 4 (Tables III, IV and V on pp. 1283—125) 
to four decimal places. With the series up to 8, the theoretical Y-coordinate will 
be found, according to Charlier, but from our experience of tetrachoric functions 
we are exceedingly sceptical about the accuracy of such a result. In fact, we feel 
certain that the approximation will not be a good one. If the frequency curve 


be little different from the normal then possibly the approximation would not be 
very bad. 


8, and 8,* will only be required and so he has tabled ¢, (2), 








The above investigation was undertaken by me at the suggestion of Professor 
Pearson and I am indebted to him for several hints. My grateful thanks are due 
to Miss I. M*Learn for her assistance in the preparation of the diagrams. 


* Charlier defines the ‘ skewness’ S to be S=3£3 and the ‘excess’ E to be H=3£3. 








MISCELLANEA. 


I. On the x? test of Goodness of Fit. 
By KARL PEARSON, F.R.S. 


In a paper published in the Philosophical Magazine for July 1900, pp. 157—175, I dealt with 
the following problem: A very large population is sampled, say, the population 7, nz, ... Mg, «.. Mp 
with total V, and any individua! sample is m,, mo, ... m,, ... mp, total M. The “ probable constitu- 
tion” is given by: © : 

M M VW 


, M nro eT. =, 
m, =H" Me = "2 eee Meg = ye eee iy = ns 
If a large number of samples of size / are taken, what is the distribution of variations from 
the “probable constitution” in these samples ? 
I showed that if the distribution of categories were such that no category contained a few 
oS f=) d 
(m —m, )* 
. ms ) , and pro- 
Ms 
vided a value for the probability P that samples would not diverge more than any given sample 
5 a 
from the “ probable constitution.” This process is now familiar to statisticians as the x’, P test. 


isolated units, then the distribution depended on the calculation of y?=S7 


The sole limiting conditions were that the samples should be random, and each should be of 
the same size MV. 

In some cases the “ probable constitution” (m’ series) can be found At once because the dis- 
tribution of the sampled population is known a priort. In other cases the values of the m’ series 
have to be approximated to, and such approximations are the general rule in all discussions of 
probable error. 

We say for example that the standard deviation of the mean of a sample taken from an 
indefinitely large population of size V and standard deviation o is o/x, where n is the size of 
the sample. 


We say that the standard deviation of second moment-coefticients of samples of size x is 


= 
Vit 
where py (=o) and py are the second and fourth moment-coefficients of the population sampled. 
In fact every constant of the sample has a probable error determinable in terms of the constants 
of the sampled population. All these distributions of deviations from “ probable constitution ” 
are true for perfectly general but random samples of size x drawn from our indefinitely large 
population. 

But unfortunately in a considerable number of cases that sampled population is unknown to 
us ; we have no direct means of finding po, py, etc. What accordingly do wedo? Why we replace 
the constants of the sampled population by those calculated from the sample itself, as the best 
information we have. And the justification of this proceeding is not far to seek. p, as found for 
the sample will only differ from the p, of the sampled population by terms of the order 1//n; 
for example if we are not dealing with sma/Z samples, and o’ be the standard deviation of the 
sample, o’ differs from o by terms of the order o/V2n and accordingly the standard deviation of 
the mean is written o’//n when it is really o//x. This method of treating probable errors is 
universal in the case of fair sized samples to-day and scarcely needs justification. In writing the 








Miscellanea 187 


sample values of the constants for those of the sampled population, we do not m any way alter 
our original supposition that we are considering the distribution of random samples of size n. 
We have still »—1 degrees of freedom, if we have p categories of frequency. 


The process of substituting sample constants for sampled population constants hike not mean 
that we select out of possible samples of size n, those which have precisely the same values of 
the constants as the individual sample under discussion. Clearly the given sample has definite 
moment-coefficients, and if there be p frequency categories the first p—1 moment-coefficients 
together with the size ” of the sample would suffice to fix all the frequencies of the p categories*. 
Hence no deviations from the “probable constitution” would be possible if we confined our 
attention to samples of 7 tied to the constants of the given sample! In using the constants of 
the given sample to replace the constants of the sampled population, we in no wise restrict the 
original hypothesis of free random samples tied down only by their definite size. We certainly do 
not by using sample constants reduce in any way the random sampling degrees of freedom. 


What we actually do is to replace the accurate value of x?, which is unknown to us, and 
cannot be found, by an approximate value, and we do this with precisely the same justification as 
the astronomer claims, when he calculates his probable error on his observations, and not on the 
mean square error of an infinite population of errors which is unknown to him. The whole of this 
matter was very fully discussed (pp. 164—7) in my original paper dealing with the y?, P test. 

The above re-description of what seem to me very elementary considerations would be 
unnecessary had not a recent writer in the Jowrnal of the Royal Statistical Society + appeared to 
have wholly ignored them. He considers that I have made serious blunders in not limiting my 
degrees of freedom by the number of moments I have taken; for example he asserts (p. 93) 
that if a frequency curve be fitted by the use of four moments then the x’ of the tables of 
goodness of fit should be reduced by 4. I hold that such a view is entirely erroneous, and that 
the writer has done no service to the science of statistics by giving it broad-cast circulation in 
the pages of the Journal of the Royal Statistical Society. 


What he would obtain if he placed this restriction on his samples is not the x? for the distri- 
bution of samples of size 2, but of samples which give definite moments. The absurdity of this 
manner of upproach is at once obvious, if as I have suggested, we consider the p first-moments, 
as there is no reason why we should not do,—for these are just as much “fixed” as the first four— 
and the conclusion must be that we can learn nothing at all about variation from our sample ; 
for we have p frequency groups and p-tying conditions. 

When we wish to find the probable error of a mean or a standard deviation, we do not start 
by fixing down these characters to their values in the individual sample; we suppose them 
to take all the possible values they could take by sampling, and after we have reached our 
measure of variation we then put into our formula the sampled values, to give an approximate 
value to the functions reached, because we are in ignorance of the real values in the sampled 
population. 

The writer in the Journal a the Royal Statistical Society speaks as if I applied x? to a con- 
tingency table starting by fixing the marginal totals. As far as I am aware I am not guilty of 
this. My conception of contingency is very different from my conception of x. I started my 
conception of contingency with the idea not of a random sample, but with the idea that some 
function of frequencies alone without regard to their relation to the measured characters would 
lead to the value of the correlation. Naturally I started from the deviation of the individual cell 
contents from the same cell contents on the basis of independent probability, as determined by 
the marginal totals. There was no question of sampling in the matter. In now fairly usual 
notation I termed 
Mg, M.9 
Me 
* This is Thiele’s method of representing frequency distributions. 

+ Vol. uxxxv. p. 87, 1922. 


Mega! — 








188 Miscellanea 


the cell contingency and after playing about with such cell contingencies for a time succeeded in 
finding a function ¢? of them which for indefinitely fine grouping for a bi-variate normal frequency 
distribution gave the correlation 7 as : 
_ 
= A 
1+¢° 


Mee my 
es y ae 
where a. ve M 


ae a gcc ecco cecconle eeeeccceceeeeeeeesssese\ Ae 
M Mes Mos! te) 


M 


I see no reason for confusing this ¢? as a measure of correlation with the x? which is a measure 
of variability in the samples of constant size drawn from an indefinitely large population. It was 
different in its origin, as far as I am concerned, and different in its use. It is only when we come 
to consider the probable error of ¢? that we have to distinguish between (a) the actval marginal 
totals of the sample and (b) the probable constitution of the marginal totals as ded ced from an 
indefinitely large sampled population. 





There are, as those who have read Biometrika* will recognise, considerable € ‘ficulties about 
determining the probable error of g*, where 
2 
M9 
1epns( =), 
Mg, 2,9! 
and the determination of the mean ¢? and of the standard deviation of ¢? involves very trouble- 
some analysis. 

So laborious is the arithmetic involved that for ordinary statistical use it bécame doubtful 
whether it would not be better to define ¢? as the mean squared contingency measured not from 
the marginal totals of the sample, but from the “probable constitution” of the marginal totals 
of the sample as deduced from the sampled population. In this case if 


_, Me oe — | 
MN gg = WV Nes’ » M a= i Nees M os = YW Nes’ s 
( WM gy yy a 
R= 
M 
g df 
Te a eee scinietabeeibos cil 


’ D 
MN gM » 
M 2 ae s 


M 
or, 1+g?=S( TN" 29" -) : 


7 7 
M ge IM og’ 


with this change of definition the probable error and mean of ¢? are more easily obtainable, and 
in this case for the first time, Mq? can be looked upon as equivalent to a x”. 

The form (a) from my standpoint cannot be treated as a x’, because it is not the deviation- 
measure of a given sample from the sampled population. Nor again is (8) the deviation-measure 
of the sample from the sampled population, unless we assume that population to have zero 
contingency, i.e. 7’. ==m',,m’ .y/M. 

3ut x? may in the form (8) be treated as a deviation-measure of the actual sample from an 
artificial sampled “population, which differs from the actual population in having no correlation 
or contingency, but having the same marginal distributions of the two characters. 

The moment, however, we assume form (8) for our contingency we are giving, what we clearly 
must give, absolute freedom to the marginal totals of-our samples. The sole limit on our sample 
is its total size ¥. But when we come to actually calculating $? for the individual sample, or the 
mean value or the standard deviation (i.e. probable error) of ¢? for a series of samples, we have 
only one course open to us, if we do not know the constants of the sampled population, we must 
insert the marginal totals of the individual sample of which we have cognizance in place of the 


* Vol. v. p. 191, Vol. x. p. 570, Vol. x1. p. 570, and Vol. xi. p. 259. 








Miscellanea 189 


unknown values of the sampled population. Thus (a) and (8) provide ultimately the same ¢?, 
but the probable error of ¢? and the mean value of ¢? will be different in the two cases. In the 
first case we vary our marginal totals with the sample as they obviously would vary in practice. 
In the second case we define our ¢? to be a deviation from the independent probability of an 
artificial population, we do not keep the marginal totals of the sample fixed any more than in (a). 
But if we think in terms of x? (and not *) we appear to do so because ultimately we have to take 
our marginal probabilities as those of the sample in default of a knowledge of any better values. 

This point seems to me well illustrated in what my critic in the Journal of the Royal 
Statistical Society has to say on p. 90 of his paper about Messrs Greenwood and Yule’s use of x? 
for a fourfold table. He asserts that they ought to have entered the table of goodness of fit with 
n'=2. The problem before them was whether their fourfold tables could possibly be samples of 
bi-variate independent probability distributions. Each sample from such a distribution would 
have perfectly free cell frequencies 74), 712, M1, M2, Subject to the sole binding condition that 

My +My + Mo + Moy = M. 
The proper y? is given by 


: wim m';,m' 9\” - m's,m' \? ss m's,m’ .9\* 
ty —-— M12 — —ze— 1-7 22 —- 7 
‘ M M . M . M } 


= ~ 4 


mm’ m1.’ 9 M's, M's, oo 
M M M M 
and this has three degrees of freedom and is what Messrs Yule and Greenwood desired to find, 

and they properly used the value of P for n’=4. 





x (7), 


Then like the astronomer, who finding the probable error of his mean to be °674490/,/M and 
not knowing the o of his sampled population, puts it equal to the o of his observations, so 
Messrs Yule and Greenwood very properly replaced the marginal totals of their unknown 
population by those of their sample, but very properly did not replace n’=4 by n’=2 1}. 

But says my critic*, if they had, they would have got the same measure of improbability as if 
they had compared the difference of percentages! Quite so, and obviously so; for in taking 
percentages they have actually fixed their marginal totals taking 100 of each class and thus for 
the first time confined their attention to a limited class of samples, not the random sample of 
size M, which has not its marginal totals fixed. We have, indeed, reduced our degrees of freedom 
by two in taking ratios: 

When we consider generally the x? for a fourfold table to measure the improbability of a 
sample we are really comparing the special sample 


a b | a+b with a’ oo | a+ 
or | | | 

c | d | c+d é d | ¢+d 
ate|b+d| M a't+e | +a | M 





the general population, where in the latter case a’d’=c'b’. 


Now the mean square contingency of the first of these tables is 


((« A (a +b) (a4 )\’ (o _ (a4 b) (b | =y (c ¥= (a+ ¢) eto (a- (c+d) ty" 





gs 1} M fa M M + M 
= M | (a+b) (a+e) . (a+) (b+d) $ (a +e)(e +d) (e+d) (b+d) 
| _ M M M 


a? b2 ed ; d2 
ni { +b) (ate) * (a+b)(b+d) | (@+e)etd) ' (e+d)(o+d) i} 
(ab — ed)? 
“e (a+b) (a+e) (6+d)(e+d)° 





* Loe. cit. p. 90. 








190 Miscellanea 


But the ,? is 
(a t, (a’ +0’) (a' + 2 (6 “a (a’' +b’) (b'+ “Ny (c -: (a’ +e’) (¢ + “yy (a s (c’ + ws + 2) 











s M M M e 
CET DCE) aE CE DICED) (a'+¢)(¢ +d) (+d) (+d) 
M es M AM 


= MU aby = + va + e + = a iad i} 

is (a+b) (a’' +e) * (+0) 4a’) | (+c) (+d) (+d) (b +d) : 
there being three degrees of freedom or we must take n’=4 in calculating the probability P, this 
may be written 





uo a b Ce d? 
os ACA ’ P'aP'» * P2P'. = PoP x i} 
where p’,;, p'.2, p'1. and p's, are the four percentage numbers of the marginal categories in the 
sampled population. Now we do not know these percentages in that population and we do what 
every physicist, every astronomer, and—till I saw the paper by my critic in the Journal of the 
Statistical Society I should have said—every statistician does, supply the unknown constants 
from the sample, which leads us to 
; M (ab —ed)? ‘ 

(a +b) (a +e) (b+d) (e+d)_ vy 
as used in my memoir of 1912*. 

The problem I had and still have in view is the variability in samples of definite size—with 
no other restriction than sample size. The solution of that problem is absolutely comparable 
with that of any discussion of the probability of an observed result in the theory of probable 
errors. We have in the bulk of such cases constants involved which concern the distribution in 
an unknown population, and we supply those constants from the sample itself. 


As I have already noted the probable error of a mean is 


67449 ./ uo! — uy? 
JM 

By this we understand that the means of samples restricted solely by their size M from an 
indefinitely large population of moment-coefficients py’, po’ about a fixed origin will have a 
variability determined by the above formula. But when we proceed to give both yu,’ and ps’ the 
values determined from the sample we know, we do zo¢ add in the manner of my Royal Statistical 
Society critic, “but in doing so the type of samples is reduced to those having the mean and 
standard deviation of the sample.” If we did, this selection of samples would clearly have no 
variation of mean or standard deviation at all! In fact probable errors would be meaningless, 
unless we drew our samples from a population already fully known to us, in which case we should 
not in 99°/, of cases want to sample it at all. 


In the same way when we use the marginal totals of the sample in formulae like (8) we do not 
thereby reduce our samples to those having constant marginal totals, we merely take the best 
approximation available to the proper value of x?, and the fact that x’, as found from the sample, 


is only an approximation to the true x? was fully recognised and discussed in my original memoir 
in the Philosophical Magazine. 


It only remains to say that the following sentence of my critic’s paper seems to me based 
upon a fallacious principle and apparently flows from a disregard of the nature of probable 
errors in general. 


“Tt should be pointed out that certain of Pearson’s Tables for Statisticians and Biometricians, 
namely Tables XVII, XTX and XX, together with XXII (Abac to determine 7’) are all calculated 


* On a novel method of regarding the association of two variates classed solely in alternative 
categories. Drapers’ Company Research Memoirs, Cambridge University Press. 








Miscellanea 191 


on the assumption that 2’=4 in fourfold tables, and consequently should not be used when, as is 
almost always the case, the marginal totals are obtained from the data” (loc. cit. p. 91). 


I hold those tables are quite correctly calculated for n’=4, and those who attempt to modify 
them by assuming 2’=2 will be dealing with an entirely different problem. Namely, they will 
be considering not the improbability of the given sample as one of all possible samples of the 
given size, which it really is, but one of the indefinitely smaller number of samples that have 
fixed marginal totals. We do not find the probable error of 7 for a tetrachoric table* on the 
assumption that the marginal totals are fixed. We find it on the assumption that the marginal 
totals also vary from sample to sample, and when we have found it, then we substitute in the 
result the values of not only the marginal totals, but the cell-contents, a, b, c, d of the sample 
itself for those of the unknown population. With x? we go through an exactly similar process of 
reasoning. If by this procedure we in some mysterious manner tied our degrees of freedom down 
to the values of the cell-contents used in our formula and adopted from our sample there could 
be no probable error for 7, for the values of a, b, c, and d are all required and used. I trust my 
critic will pardon me for comparing him with Don Quixote tilting at the windmill; he must either 
destroy himself, or the whole theory of probable errors, for they are invariably based on using 
sample values for those of the sampled population unknown to us. For example here is an 
argument for Don Quixote of the simplest nature: In the sth category of a population WV the 
frequency is ,, a sample shows m, in a total Jf, The standard deviation of this frequency is 


Ns Ns 
afm N (1 co #) . 


But we don’t know the population sampled and accordingly obtain an approximate value of the 


ee ase oye 5 m : ° ge 
above standard deviation by writing for Vv ; Vv and taking for the standard deviation of m, 
4 4 


af (a ~ it) . In doing this it is not a question even of using a marginal total, we have used 
a cell frequency found from our sample. We have therefore according to our critic reduced our 
possibilities of freedom by selecting out of all possible samples those with m, in the sth cell—this 
is exactly parallel to our reducing our freedom by “fixing” marginal proportions or moment- 
coefiicients. But if m, be fixed, it is ridiculous to talk of a variation of the m, frequency. There- 
fore either m,=0 or m,=, or the usual theory and practice of probable errors are wholly at 
fault. I think this will illustrate what I mean by Don Quixote and the windmill. 


II. 


Is Tuberculosis to be regarded from the Aetiological Standpoint as an acute disease 
of Childhood? By Dr Kr. F. ANDvorp (Christiania). Tubercle, Vol. 111. No. 3, 
December, 1921. 


This paper is, we must confess, unconvincing. The author holds that in a community that 
has long been subject to tuberculosis the time of infection should be fixed in the infantile years 
for the great majority of cases and consequently we should protect children for the first three or 
four years from infection. 


As evidence of his views he takes a graph of what he calls a “population frame” which is 
really the well-known “ number living in a stationary population ” (/,) and represents within this 
graph the numbers dying from tuberculosis and the numbers who have suffered from it at each 
age. We are doubtful if his graphs for deaths are correctly drawn. They are made to rise 
suddenly for about a year and then fall till age 7 but we suspect that they should fall from birth 
till age 7. We cannot justify his chart (No. VIII) which gives the whole population and the 


* Phil. Trans, Vol. 195 A, p. 14. 








192 Miscellanea 


tubercular population. The non-tubercular found by this chart actually increase after age 17 for 
many years so that the non-tubercular not only have no mortality but are increased by some 
process of resurrection! Admittedly the chart is hypothetical but as it stands it calls for 
amendment. 

Dr Andvord’s remark that “one would hardly gather from these per-thousand curves,” i.e. 
from rates of mortality for various ages, “that, as is really the case, more persons die from 
tuberculosis in the first and second years of life than in any subsequent age period” seems to 
betray an inexperience in matters related to a life table: this weakness is shown elsewhere, e.g. 
p. 102, where deaths are stated without populations and without reference to age distributions. 


Dr Andvord may have other evidence in support of his views but the article under review 
does not justify them statistically ; we think every point he brings out could be explained as 
well on other hypotheses. He cannot, moreover, completely prove his case till he has studied 
communities which become subject to infection after having been kept free from it. For if his 
theory be correct, the measures he proposes would necessarily produce such a community. 


W. Pain ELDERTON. 











Su 
Volu 
to ar 
PEAI 
Eng! 
parts 
now 
publ 
in tk 
The 


so sk 


Subscribers are reminded that subscriptions, namely 44s., to 
Volume XV are now due and that this covers packing and postage 
to any address. Payments may be made directly to Professor KARL 
PEARSON, Biometric Laboratory, University College, London, W.C. 1, 
England, or through any agent. Unless subscriptions are prepaid, 


parts can only be bought at wrapper prices. Complete s¢ 

now very scarce, and can only be obtained when the purchasers are 
public libraries or institutions, or scientists of distinction. Hence 
in the case of such a purchase the ultimate /ocus must be stated. 
The sale of Volumes 1V to XIV is not yet restricted, but must be 
so shortly. 





